Quantitative Analysis and Performance Evaluation of Target-Oriented Replication Strategies in Cloud Computing

: Data replications effectively replicate the same data to various multiple locations to accomplish the objective of zero loss of information in case of failures without any downtown. Dynamic data replication strategies (providing run time location of replicas) in clouds should optimize the key performance indicator parameters, like response time, reliability, availability, scalability, cost, availability, performance, etc. To fulﬁll these objectives, various state-of-the-art dynamic data replication strategies has been proposed, based on several criteria and reported in the literature along with advantages and disadvantages. This paper provides a quantitative analysis and performance evaluation of target-oriented replication strategies based on target objectives. In this paper, we will try to ﬁnd out which target objective is most addressed, which are average addressed, and which are least addressed in target-oriented replication strategies. The paper also includes a detailed discussion about the challenges, issues, and future research directions. This comprehensive analysis and performance evaluation based-work will open a new door for researchers in the ﬁeld of cloud computing and will be helpful for further development of cloud-based dynamic data replication strategies to develop a technique that will address all attributes (Target Objectives) effectively in one replication strategy.


Introduction
Over the last few years, cloud computing has shown a significant impact in the field of storage systems. It is recognized as web-based administration of configurable, parallel, and adaptive systems and has advanced as a most recent approach for accessing, managing, and controlling the massive, distributed data at various geographical areas. The main purpose of cloud computing is to provide a simplified and proficient on-demand network access, along with service to a pool of shared virtualized processing assets based on a pay-asyou-go agreement [1][2][3][4]. Besides providing data availability, it additionally improves load balancing, fault tolerance, and scalability. Moreover, it minimizes the job execution time, bandwidth consumption, and performance. The services offered by the cloud incorporate infrastructure flexibility, cost control, faster application deployment, data adaptation of cloud resources to real needs, and improve profitability. In distributed data centers, there is a huge demand to store plentiful data on cloud foundations due to the integration of computer networks, servers, storage, and numerous related programming schemes [5]. The expanding quantities of cloud-facilitated applications that are fueled by the cloudfacilitated database systems are generating and consuming a tremendous volume of data at there is an update/write operation, a similar update should be immediately passed to other replicas, too. The main idea is to recover the lost data by utilizing these replicated copies from the cloud [26]. Data replication is considered a performance-enhancing technique for cloud storage frameworks that generally has been used and adopted by large-scale cloud storage systems. In large-scale cloud storage systems, it is the only solution that provides the data availability along with performance in case of occurrence of disaster(failures). By utilizing these numerous replicated copies, data replication guarantees high information sharing and access latency, along with improved system load balancing. One of the biggest advantages of data replication is its consistency in decreasing the response time and improving the reliability [3,27]. Other advantages include accelerating the data access, reducing access latency, least network delays (user waiting time), and bandwidth usage (cloud system bandwidth capacity utilization) [5,28]. Consequently, data replication is used in the clouds for upgrading the performance (e.g., read and write delay) of applications that access the data [8]. However, the risk of node failure in cloud storage frameworks within a data-intensive application is around the clock [25]. Proper implementation and execution of data-replication mechanisms over the cloud services will promote the availability, fault-tolerance, and failure recovery [29]. Therefore, keeping the data at more than one site will increase the availability, and the request can discover the data close to the site where the request originally began, subsequently limiting the service request time and improving the performance of the system in general.

Research Motivation
There has been extensive research going on to optimize various types of dynamic replication strategies. We try to analyze and evaluate the target-oriented replication strategies in large-scale cloud storage systems based on target objectives which are represented through various attributes correspondingly discussed in different previous works [30][31][32]. The attributes associated with these target objectives are Availability, Reliability, Performance (Storage Space, Storage cost, Bandwidth Consumption, Response time), Fault Tolerance, Load Balance Scalability, Elasticity, Consistency, and Cost. The main motivation for this research is to discover the target objectives related to dynamic replication strategies and elaborate on their utilization in each replication strategy.

Paper Organization
This paper is organized as follows. In Section 2, we present the review methodology. In Section 3, we presented a taxonomy for data replication strategies. In Section 4, we present a taxonomy for dynamic cloud computing replication strategies and target-oriented taxonomy for dynamic replication in cloud computing along with relationships of target objectives based on attributes. The section also contains the quantitative analysis summary. In Section 5, we present a comparison and evaluation in details. In Section 6, we present challenges of replication strategies in clouds. In Section 7, we present the least addressed target objective of dynamic replication strategies in clouds, their challenges, issues, and future research directions. In Section 8, we conduct our discussion. In the final section, we present our concluding remarks.

Research Methodology Used
In our first phase of research methodology, we select the research papers for our critical analysis and evaluation through searching various types of databases. In the second phase, we include and exclude the papers based on their title, abstract, and main content. The next phase followed was by checking the accepted papers against each formulated research objectives. Finally, in the last phase of our research methodology, which was based on reading the full content, the main papers were collected for our quantitative analysis and performance evaluation.
In this section, we discuss the research questions related to our research, source of information, service criteria, quality assessment, and review phases.

Research Questions
In this section, we present the research questions we adopted in our critical analysis and performance evaluation. The motivation behind each research question is mentioned accordingly shown in Table 1. Mainly, the aim is to identify and evaluate the various data replication strategies related to articles/studies based on various replication strategies of previously published and mention their importance in cloud computing environments.
2 RQ2 What are the main target objectives, a replication strategy should possess? Here, we discuss various dynamic replication strategies based on different categories especially the target-based dynamic replication strategies to understand the need for each replication strategy. 3 RQ3 What are the most used research replication strategies and how are they applied in the cloud replication area? 4 RQ4 What attributes a replication strategy might consider meeting the target objectives?
This research will aim to provide different target-based objectives of dynamic replication strategies and their dependent attributes for best optimization. The relationship will help to understand the utilization of different algorithms for the best performance. 5 RQ5 What is the relationship between target objectives with their concern parameters? 6 RQ6 What is the relationship between different parameters and what are their metrics?
Various research papers need to be identified from different replication strategy categories to reveal vital research problems. The research will display the quantitative analysis for performance evaluations of target-oriented replication strategies in cloud computing. There is a need to develop a technique that will address all attributes (Target Objectives) effectively in one replication strategy. 7 RQ7 What are the main metrics used for performance evaluation purposes? 8 RQ8 What are the key results obtained? This research also aims to identify the main issues and challenges of existing target-oriented replication strategies along with future directions to ensure optimal services. Various questions discussed here will help in the identification of future research areas. 9 RQ9 What are the main challenges and open issues of replication in cloud computing?

Sources of Information
We searched for various digital library sources (Scopus, Web of Science, Google Scholar, etc.) to find the relevant papers related to cloud-based replication. We searched for journals and conferences and for books to extract the relevant research papers. The following databases has been used in our search: Springer, ScienceDirect, Scopus, Google Scholar, ACM Digital Library, IEEE Xplore, and Taylor & Francis.

Search Criteria
We formulated the keywords to search in the above-mentioned databases using the specific keywords "Data Replication Strategies/Techniques" and "Cloud Computing". In this research, we use the title and abstract of the research papers to get our results. We tried searching various related keywords that matched our target results, like "cloudbased replication strategies" and dynamic data replication strategies. Then, our process of searching the articles was based on adding the "Target Objective" prefix, like "Target objective cloud-based replication strategies" or its synonym "target-oriented cloud-based replication strategies". We also searched by using various parameters/attributes of cloud computing replication, for example, "Performance" in a way like Performance Analysis of Data Replication Strategies in Cloud".

Quality Assessment
On searching for articles related to our topic, we applied the inclusion and exclusion criteria mentioned below: For Inclusion, we followed:

1.
Clearly describes target objectives of replication strategies for cloud computing.

2.
Peer-reviewed articles in the English language. 3.
Articles published in reputable journals, conferences, and magazines. 4.
Articles published from 2011 to 2019.
Does not focus on dynamic replication strategies in the cloud.

2.
Articles that are not related to the research questions. 3.
Articles whose full text is not available. 4.
Articles that have common challenges and references.

Review Phases
After defining the search keyword, our four-stage review phases are summarized as: • First of all, the articles were searched based on defined keywords (mentioned in search criteria) and were initially found to be 109 articles in total. • Then, articles were excluded that do not meet inclusion, exclusion criteria. This criterion minimizes our article search to 53 articles.

•
Then, research question objectives were used for further filtration of articles. This criterion also minimizes our article search to 28. • Finally, articles were evaluated based on full paper reading and the total papers finalized for this research were 22.
For RQ1-RQ5 and RQ8 and RQ9, we collected a total of 108 papers, which include the related surveys also. For RQ 6 and RQ 7, we collected a total of 22 papers.
In this paper, we initially introduce a taxonomy of replication strategies, along with their related survey of surveys. Then, we provide the taxonomy of dynamic cloud replication strategies, along with their related survey of surveys. At last, our focus of this paper is target-oriented replication strategies and their taxonomy, along with a detailed investigation.

Data Replication Strategies
In the last few years, there has been a huge contribution from many researchers, scientists, and academicians in the field of data replication. The contribution not only gives the optimal solution to the basic issues of the replication strategies but also provides a smooth way to implement these replication strategies in different distributed architectures. The main intent is to get the benefit of replication strategies in various types of distributed architectures mentioned in Refs. [33,34], which include Distributed Database Management Systems, Peer to Peer Systems, Data Grid, Worldwide Web, Distributed Geographic Information Systems, and others, especially in Cloud Computing [35]. At present, many efforts have been utilized which have strengthened the replication strategies roots deep into the cloud computing architectures. On this subject, various researchers have contributed vigorously. Several researchers had contributed for the implementation-related issues, several contributed for optimization, and several researchers have provided reviews which include classification and taxonomy of replication strategies for cloud-based structures using different criteria. Figure 1 depicts the taxonomy of data replication strategies. In this section, we present a taxonomy of replication strategies based on a distributed architecture (shown in Figure 2), we have categorized the data replication into (1) grid computing replication strategies, (2) other distributed architecture replication strategies, and (3) cloud computing replication strategies.

Grid Computing Replication Strategies
A data grid is a cluster of services that furnish smooth access, modification and transfer of a substantial amount of data over geographically distributed structures. Hence, massive storage resources are the basic requirements for the storage of data files. To furnish the storage of data files in these large storage systems, data replication makes a great impact by scaling back data time intervals and using fast network and storage resources efficiently for the efficient recovery [36]. Grid computing-based replications are utilized in various scenarios, and its research has a tremendous future scope.

Related Surveys
Various research efforts have been made for the replication strategies related to grid computing. We collected some of the reviews of various replication strategies for gridbased environments, for the sake of understanding the replication strategies concerning grid computing terminology.
Amjad Sher et al. [30] proposed an extensive review on grid computing replication strategies. They split the replication strategies into many categories based on nature and architecture of data grid structures. The proposed survey was targeted to enhance and improve the data availability using dynamic replication strategies in data grids [30].
Hamrouni et al. [32] reviewed the data replication strategies and specifically stressed more on replica selection strategies. Replication selection strategies act as an important data management technique mostly used in data grid structures for the enhancement in network performance, file access patterns, user or job access behavior, and file correlations, as well as in prediction of future behavior. The strategies were discussed, along with advantages and disadvantages [32].
Naseera et al. [37] proposed a comprehensive survey on the issues and challenges involved in grid environment-based data replications with a focus on concerns, such as replica consistency, replica synchronization, and replica maintenance. This survey provides a general review on data replication related to various important aspects, namely replica creation/modification, replica selection, the optimal number of replicas, and replicas consistency. They also mentioned the limitations, as well as future enhancements [37].
Vashisht et al. [38] classified and analyzed various asynchronous replica consistencies which were classified based on different criterion, such as the level of abstraction, load balancing, update propagation, fault tolerance, topology, location, check-pointing, and many more strategies [38].
Tos et al. [39] presented a survey of the latest dynamic grid-based data replication strategies. The classification criteria for their strategy are based on target data grid architecture. Their work includes the survey of the strategies and their feature comparison using important metrics for evaluation [39].
Hamrouni et al. [40] presented a similar work for replication strategies in a grid computing domain particularly using data mining techniques. This study narrates the use of data mining techniques in grid-based setups to understand and evaluate historical data [40].
Mansouri et al. [41] proposed a survey which investigated to determine which attributes are assumed in each replication algorithm and which are declined. They represented the important factors to facilitate the future comparison of data replication algorithms and presented some interesting discussions about future works along with open research challenges [41].
Souravlas et al. [42] provided a general summary of latest strategies for replication based on selection criteria (geography, space or time) for data files to be replicated. Moreover, they mentioned the pros and cons of each strategy and evaluated the performance based on a bunch of parameters [42].
Some of the latest research to enhance the replication strategies in the field of grid computing are addressed in Ref. [43] for replica creation, Refs. [44,45] for replica placement, and Ref. [46] for distributed database systems.

Other Distributed Architecture-Based Replication Strategies
Replication strategies related to other distributed architecture include distributed database management systems, peer-to-peer systems, worldwide web, utility-based distributed systems, and many applications, like mobile systems, artificial intelligence, business application, etc. These systems are mostly concerned with the need basis and are application oriented. There is always a strong connection between the applications and the distributed systems. Several applications are developed based on system needs which keep on growing with time. A lot of research has been done beginning with a simple creation of architecture based on the number of requests, initiated by a client to a server. Such vital architectures are unable to handle large numbers of requests, and there is always a performance constraint to maintain the response time and to efficiently use the network bandwidth. To some extent, the mobile agents strive to control these above discussed demerits but could not succeed fully to cover up the growing demand and technology setups [47]. Other architecture-base replications strategies are utilized in various distributed firms and hence more efforts should be used for their performance.
Although peer-to-peer (P2P systems) are mainly designed for read-only database applications, while as others deal with transactional queries related to databases, data grid systems deal with read-only queries. The benefits of the replication in read-only database applications can be neutralized by the overhead of maintaining continuity among multiple replicas if the application needs to process updated queries [48]. The latest roles of various other architecture-based applications, their utilization in various domains, and analytics can be found in Reference [49]. Replication related to applications like mobile systems, artificial intelligence, business application, etc., is mostly dependent on storage utilization. The application domains decide which storage systems to use and what the processing techniques should be, while keeping the storage restriction in view.

Related Surveys
Various research efforts have been made for the replication strategies related to other distributed architecture computing. We collected some of the reviews of various replication strategies for other distributed architecture-based environments, for the sake of understanding the replication strategies with respect to other distributed architecture terminology, like (peer-to-peer systems) p2p, (database management systems) DBMS, mobile computing, etc.
Sushant Goel et al. [34] presented an extensive review of distributed storage and data distribution systems, where they split the distributed systems based on their architecture into four subclasses, namely (a) Distributed database management systems, (b) Peer-topeer Systems, (c) Data grids, and (d) Worldwide web. Furthermore, their contribution also includes the further classification of the above four subclasses in detail [34].
Whereas the review of Spaho et al. [50] presented a survey of p2p systems, the proposed survey is based on the classification of replica placement strategies by utilizing the criteria of site selection and replica placement. These two criteria provide the depth in comprehensive classification of P2P systems [50].
Some of the latest research to enhance the replication strategies in other distributed architecture are addressed in Reference [51] for Cloud-P2P environments, Reference [52] for document-oriented (Not only SQL) NoSQL (Not only SQL) systems, Reference [53] for replica selection in Internet of Things (IoT), Reference [54] for cost aware heterogeneous cloud data centers, and Reference [55] for mobile ad hoc networks structures.

Cloud Computing Replication Strategies
In a cloud-based replication, the data files are split into multiple blocks over the distributed network. The aim is to have multiple copies (replicas) of the same data at various distributed data nodes. However, the network dependency factor within the dataintensive application causes the node failures in the large cloud storage system. These network factors include (e.g., bandwidth, node failure and untrustworthy networks). If a node holding a data file fails to work, then the whole data file will be gone. Therefore, there is always a need for data availability [25].
Machine learning refers to a collection of algorithms that can detect patterns in data and predict outcomes in the event of a decision. Machine learning algorithms have been used to avoid or detect attacks and security problems, including cloud vulnerabilities, in a variety of ways [56]. The use of machine learning and its applications in cloud computing and related environments has been discussed in some of the most recent related works [57][58][59][60][61].
Different users exchange sensitive data over the cloud in cloud computing, and failures are possible. As a result, data fragmentation and replication algorithms can help improve data protection. As a result, the idea of safe data replication (SDR) was developed, in which attackers are unable to determine the positions of replicas and the replication process is secure [62]. Machine Learning techniques are used in replication to secure the clouds [63]. In Reference [64], the authors use machine learning to implement a multiobjective optimization data placement strategy in large-scale networked storage systems that considers data protection and retrieval time. As a result, it ensures that the replication process can run faster and be more secure.
Data replication techniques in clouds are broadly labeled into two basic categories, which include static replication mechanism and dynamic replication mechanism [5], and their summary is presented in Table 2. Table 2. Static data replication versus dynamic data replication.

Brief Description
In static data replication, a predefined set of replicas and host nodes are the key factor to achieve the data distribution at multiple sites. It determines the replica node locations at design phase.

Brief Description
In dynamic data replication, the key factor to achieve the data distribution at multiple sites is its automatic/adaptive nature of creating and omitting the replicas, based on user behavior and network topology. It determines the locations of replicas nodes at a run time.

Key Features
The static replication strategy accompanies deterministic policies in which the host nodes, the replica numbers are pre-decided and very much characterized.

Key Features
The dynamic strategies by default built and removes the replicas based on storage capacity changes, bandwidth and user access patterns (adaptive in nature).
The static replication strategies are always simple to implement because number of replicas is constant.
These strategies are not easy to implement because number of replicas is variable (based on heterogenous workload).
There is a need to support the random policy to keep the number of active service replicas at the maximum.
Being intelligent in nature, dynamic data replication is developed to make smart choices to choose the location of the data based on current available information.

Drawbacks
They are used less in real scenarios because of their predetermined nature. Drawbacks It is very difficult to control and accumulate the runtime information of all the data nodes in a complex cloud setup.
The more active service replicas guarantee more performance, but performance cannot be obtained at a high operation cost.
It takes a lot of effort to maintain the data file consistency effectively.

Related Surveys
Various research efforts have been made for the replication strategies related to cloud computing. We collected some of the reviews of various replication strategies for cloudbased environments for the sake of understanding the replication strategies concerning cloud computing terminology.
We have summarized some of the replication strategies in cloud computing, along with some basic categories based on various types of classifiers.
Milani et al. [35] presented a detailed investigation of data replication strategies in cloud computing environment. The authors examined the data replication mechanisms in a cloud environment and studied the features and challenges, as well as addressed the relevant issues in data replication. Additionally, they provide a detailed comparison of the data replication strategies in cloud computing [35].
Tabet et al. [65] proposed a review of data replication in clouds systems. They divided the data replication of clouds into various categories based on different taxonomies as objective function (static and dynamic), (replica factor optimal number and dynamic adjustment), (customer and provider centric), and (proactive and workload balancing) [65].
Bhuvaneswari et al. [66] proposed an extensive general review of data replication mechanisms for distributed systems. The review was broadly split into two main categories, consisting of dynamic and static replications irrespective of their architecture types, like grid, cloud, or network [66].
Some of the latest research to enhance the replication strategies in the field of cloud computing are addressed in Reference [67] for dynamic cost-aware replication, Reference [68] for cloud/edge based infra-structures, Reference [69] for mobile edge computing (MEC), Reference [70] for replication placement for in geographically distributed clouds, and Reference [71] for replication management in the cloud.
Due to predefined sets of replicas and host nodes which are determined at design phase, static replication strategies are used less in real scenarios. To overcome these hurdles, dynamic replication has emerged as a best alternative due to their adaptive nature of creating and omitting the replicas based on user behavior and network topology. These attractive characteristics had motivated us to select the dynamic replication strategies as our topic for further research.

Dynamic Cloud Computing Replication Strategies Taxonomy
In this section, we provide a taxonomy of dynamic cloud computing replication strategies and categorize them based on their service and tasks (shown in Figure 3). Dynamic cloud data replication strategies are divided into following subcategories:
Quality of service (QoS)-oriented replication strategies; and 6.
Target-oriented replication strategies.

Service-Oriented Replication Strategies
Service replication supports the non-functional requirement of services, in accordance with the understanding of Service-Level-Agreements (SLA). These services include data availability, response time, and data reliability [72]. In service-oriented replication for cloud systems, the service replicas utilize the storage resource, as well as other resources, such as central processing unit-CPU, memory, network, bandwidth, etc. The cost of replication and service dependencies are always high [73]. Therefore, service-oriented replication strategies are generally expensive in nature.

Related Surveys
Various research efforts have been made for the replication strategies related to serviceoriented computing. We collected some of the reviews of various replication strategies for service-oriented environments for the sake of replication strategies concerning serviceoriented computing terminology.
Slimani et al. [73] presented an extensive review and classification of replication approaches as SoR (Service-oriented Replication) strategies and DoR (Data-oriented Replication) in cloud computing paradigm based on replicating the service or the underlying data. The proposed survey reviewed the latest replication techniques for the basic purpose to achieve high availability and QoS in cloud computing paradigms [73].
Mohamed et al. [74] presented a review of service-based replication, their challenges, their techniques, their types, and their algorithms in different distributed setups (serviceoriented architecture (SOA), cloud, and mobile). Additionally, they also examined and explained the participation of replication in promoting various QoS attributes, such as availability, reliability, scalability, performance, and security [74] Some of the latest research in the field of service-oriented replication for cloud computing are addressed in Reference [72] for replica provisioning policy, Reference [75] for dependency aware dynamic replication, Reference [76] for replicas placement, and Reference [77] for consistency-based replication.

Data-Oriented Replication Strategies
The process of replicating the underlying data is a commonly used technique to avoid failures and is commonly known as data-oriented replication. The cost of replicating a file is much lower than replicating a service. Hence, data-oriented replication strategies are cheaper as compared to service-oriented strategies. The data-oriented replication strategies have been subdivided into three major groups based on type of cloud application workload. The first one is data-intensive workload-based, the second one is computationally intensive workload-based, and the third one is balanced workload-based [73]. While comparing data-oriented replications with service-oriented replication, the data-oriented replications are easy to implement and are more performance-oriented.

Related Surveys
Various research efforts have been made for the replication strategies related to dataoriented computing. We collected some of the reviews of various replication strategies for data-oriented environments for the sake of understanding the replication strategies concerning data-oriented computing terminology.
Milani et al. [5] presented a work that specifically categorized the replication strategies in cloud systems into two main categories: (1) static replication strategies and (2) dynamic replication strategies. Static replication strategies choose the location of replication nodes and creation of replica during the design phase(predetermined), while dynamic replication strategies choose the replication nodes and creation of replicas at a run time (automatically) under the changes in the user access pattern, bandwidth, and storage capacity [5].
Malik et al. [7] presented a survey on data management and replication approaches. The focus of the survey is more on resource usage and QoS provisioning. They also analyzed the performance, advantages and disadvantages of data replication and data management in cloud-based setups. Furthermore, the paper discusses the issue and challenges related to consistency, load balancing, scalability, processing, and data placement [7].
Tabet et al. [65] presented a comprehensive survey of data replication for underlying data in cloud systems. The proposed survey is based on five dimensions. The first one is static versus dynamic, the second one is reactive versus proactive workload balancing, the third one is provider versus customer-centric, the fourth one is optimal number versus dynamic replica adjustment, and the last fifth one is the objective function-based [65].
Some of the latest research in the field of data-oriented replication for cloud computing are addressed in Reference [71] for replication management, as well as Reference [78] for replica placement.

Energy-Oriented Replication Strategies
Energy-oriented replication strategies are part of green computing. Green computing represents purifying the environment with a focus on storage, temperature, and energy. Recent research showed that large-scale data centers consumed a huge amount of electricity [79]. Therefore, for least energy consumption, the sum of active servers should be minimized, and the utilization level of replicas should be considered, although reducing the energy consumption and maintaining high computation capacity is done by implementing the replication strategies. However, the number of data replicas are directly proportional to energy consumption, which directly affects the performance and the cost of creating and maintaining new replicas [4]. Therefore, the primary issue is to decide the number of required replicas and their location.

Related Surveys
Various research efforts have been made for the replication strategies related to energyoriented computing. We collected some of the reviews of various replication strategies for energy-oriented environments for the sake of understanding the replication strategies concerning energy-oriented computing terminology.
You et al. [80] provided a survey that gives a comprehensive understanding of the current level of energy efficiency related to surveys in cloud-related environments. Here, a survey on surveys of energy efficiency was performed based on five categories, which include the surveys on the energy efficiency of the whole cloud, of the certain levels in cloud, on a certain energy efficiency technique, on all energy-efficient strategies, and other energy efficiency-related surveys [80].
Ali et al. [81] presented a taxonomy of energy efficient techniques for cloud computing. The authors discuss the issues pertaining with huge energy consumption by cloud data centers. They presented a taxonomy of huge energy consumption issues, along with their solutions [81].
Some of the latest research in the field of energy-oriented replication for cloud computing are addressed in Refs. [4,78] for replication decision criteria, Reference [28] for communication delays, and Reference [82] for disk performance.

Big Data-Oriented Replication Strategies
The latest research shows that the cloud is the best solution for data-intensive applications. It is the only solution for optimal storage and provides terrific performance for huge data on distributed systems. Hence, a planned strategy between cloud and big data is needed to ensure consistent data accessibility without any disruption [83]. The recent research targets to provide data availability and maintain the performance of big data on clouds, even in case of disasters. Since the cloud distributes the huge, big data to various nodes either in the same data center or across many data centers on clouds [9,12,13,84].
Therefore, a reliable and efficient solution should be executed to overcome the failures using an optimal replication strategy in the cloud.

Related Surveys
Various research efforts have been made for the replication strategies related to big data-oriented computing. We collected some of the reviews of various replication strategies for big data-oriented environments for the sake of understanding the replication strategies concerning big data-oriented computing terminology.
Gopinath et al. [25] came up with a survey which includes the detailed survey related to replication and their implementation in big data domain, such as HDFS (Hadoop distributed file system). The survey gives the empirical evaluation and provides depth in the survey in the form of static and dynamic replication techniques [25].
Lalitha Singh et al. [83] introduced a related survey of cloud-based scientific workflows of various data placement strategies. Data placement strategies which use the big data are studied in detail. The main purpose of the study is to improve the performance and the data movement cost [83].
Fazlina et al. [10] introduced a survey that emphasized more on the performance factors that classify the replication strategies into static and dynamic replication based on their metrics. The survey discussed the critical review along with the imperative details collected from various references. Moreover, they also discuss the gaps in replication strategies [10].
Mansouri et al. [85] presented a critical review with imperative details. They discussed the sudden move of data-intensive (big data) applications in connection with heterogeneous distributed computing systems for efficient data management. This work presents a complete review of data replication based on cloud computing and data grid computing [85].
Some of the latest research in the field of big data-oriented replication for cloud computing are addressed in Reference [86] for elastic replication, Reference [87] for predictive analysis-based replication, Reference [88] for dynamically replica adjustment, and Reference [89] for proactive data management.

QoS-Oriented Replication Strategies
QoS aware replication needs to allocate replicas by considering the Quality of Services (QoS) requirements of cloud, such as network delay, bandwidth, loss rate, etc. QoS provides the performance guarantee and other quality of vital services, such as availability, reliability, security, dependability, etc. Being directly associated with end-users and service providers, the QoS requirements are to deliver the services according to predefined agreements [90]. Many existing replication services [91] are designed for enhancing the system-oriented metrics rather than user-oriented metrics.

Related Surveys
Various research efforts have been made for the replication strategies related to QoSoriented computing. We collected some of the reviews of various replication strategies for QoS-oriented environments for the sake of understanding the replication strategies concerning QoS-oriented computing terminology.
Saraswathi et al. [92] provided a detailed survey of data replication on the cloud environment, the proposed classification divides the data replications into QoS aware data replication and dynamic data replication strategies. The paper also mentioned the different applications have different quality-of-service (QoS) requirements and also concluded that it is very tough to maintain the common QoS during the running phase of the applications [92].
Zia et al. [93] provided a survey on various schemes about QoS issues. They analyzed the strengths and weaknesses based on their performance. This paper also investigated how the performance can be expanded by improving various segments like QoS and cost [93].
Some of the latest research in the field of QoS oriented replication for cloud computing are addressed in Reference [12] distributed cloud data placement [9], for cost-based replication, Ref. [94] for edge cloud-based replication, Ref. [95] for cost-based data replications and placement, and Refs. [96,97] for replica placement.

Target-Oriented Replication Strategies
Every single replication strategy consists of algorithms developed to meet contrasting objectives in certain environments. The main aim is to reform the divergent performance metrics. Depending on the area to be addressed, the algorithm would enhance various performance metrics, such as bandwidth usage, accuracy, response time, energy consumption, etc. [98,99]. Some of the real-time implementation of replication strategies stress more on fast response time and are used for big data domain, few others are implemented to reduce the data storage costs, and few are developed for transfer of workflow applications [10].
Each dynamic replication has its target, which represents its objectives. The basic target objectives of data replication strategies are known as primary objectives which include availability, reliability, and performance. The secondary objectives include fault tolerance and load balancing. Besides primary and secondary objectives, there are also tertiary objectives which are as much as important as that of primary and secondary objectives and must be addressed for efficiency and better performance. They include scalability, elasticity, consistency, and cost.

Taxonomy of Target Oriented Replication Strategies
In this study, we examine and collect the different types and categories of surveys related to cloud-based replication which incorporate the title, survey aim, perspective, target components, and year of publication. To the best of our knowledge, there is not a single detailed research about replication strategies in cloud environments which is systematic and comparative. Therefore, we conducted our research from the view of target objectivebased philosophy and, hence, provide a systematic target-oriented taxonomy of dynamic replication strategies in the cloud. The available latest surveys include Refs. [65,73].
Before discussing the taxonomy of target-oriented replication strategies in detail, let us have a generalized look at the available literature (shown in Table 3). Table 3. Literature review of target-oriented replication strategies.

Replication Strategy Basic Details
Replication Strategy Advantage and Disadvantages [100] Year 2011 Description: A cost-effective dynamic data replication strategy, namely (CIR), which is based on an incremental replication method with the aim to reduce the storage cost while maintaining the data reliability requirement. The approach calculating the replica creation to mention the storage duration.
Advantages: High data reliability, High availability, Low replication cost, and Low energy consumption.
Disadvantages: High response time and Low load balancing. [101] Year 2012 Description: A novel dynamic data replication strategy, namely (D2RS), which calculates a suitable number of copies based on evaluation and identification of popular data. Moreover, it also analyses and models various relationships accordingly.
Advantages: High availability, Low bandwidth consumption, and Low replication cost.
Disadvantages: High user waiting time, Low speed data access, and Low load balancing. [102] Year 2015 Description: A cost-effective data reliability mechanism, namely (PRCR), which is based on a generalized data reliability model. It works on a proactive replica checking approach to ensure the reliability of the data while maintaining the minimum number of replicas.
Advantages: Cost effective reliability, Less failure rates, Reduced storage space, and storage cost.

Replication Strategy Basic Details
Replication Strategy Advantage and Disadvantages [104] Year 2012 Description: An adaptive replication strategy that redeploys dynamically large-scale various file replicas on different data nodes and selects the data files which require replication based on minimal cost in order to improve the system availability.
Advantages: Cost effective, Low response time, Low bandwidth consumption, reduced waiting time, and High data access speeding up.
Disadvantages: Less data availability. [105] Year 2015 Description: A runtime-based replica consistency mechanism, namely (RBRC), which is mainly used for cloud storage systems. The mechanism achieves a dynamic balance between performance and consistency using read frequency. This method is based on access frequency and its access time.
Advantages: Decreased average file access time, Low replication delay time.
Disadvantages: Average load balancing. [106] Year 2015 Description: An adaptive consistency guarantee model that probes the consistency index of an observed replicated data object in an online application. The main aim is to reduce response time.
Advantages: Maintained response time and Time delay.
Disadvantages: Assumed Load balancing setting for the Implementation.
[99] Year 2017 Description: A novel replication strategy which is used to reduce data storage cost in workflow applications. The strategy considers various parameters for the cost-related effectiveness, which include access frequency, data center storage capacity, the constraints of dataset dependency, and size of datasets in the build-time stage.
Advantages: Reduced cost of data management, decreased data movement, and decreased data transfer cost.
Disadvantages: Increased response time.
[107] Year 2016 Description: A dynamic cost-aware replication strategy, which optimizes and identifies the least number of replicas that are required to maintain desired availability along with data reliability.
Advantages: Low replication cost, High reliability, and High availability.
Disadvantages: Low consistency rates, Low load balancing, and High response time.
[108] Year 2013 Description: A response time-based replica strategy, namely RTRM, consisting of replica creation methods. The aim is to automatically increase the number of replicas based on average response time while maintaining the performance.
Advantages: High performance, Low response time, High rapid data download, Low energy consumption, and High data availability.
Disadvantages: Low reliability, Low load balancing, and High replication cost.
[109] Year 2013 Description: A modified dynamic data replication strategy with synchronous and asynchronous updating. The work is based on the decision of a reasonable number of replicas, along with the right location of replicas, while keeping in mind the execution time.
Advantages: Execution time, High availability, and Performance.
Disadvantages: Low speed data access and Low load balancing. [110] Year 2014 Description: A dynamic replica selection and placement strategy which is used for cloud replica management. A replica creation is adapted continuously by changing network connectivity and users. It designs an algorithm for suitable optimal replica selection and placement with a target to increase data availability.
Advantages: Low access time, Low response time, low access cost, Shared bandwidth consumption, and delay time.
Disadvantages: Low Load balancing. [111] Year 2015 Description: An effective dynamic replica placement algorithm, namely BPRA, which is based on minimal blocking probability. The main intention is to improve the load balancing using user access information.
Advantages: Improved load balance, Reduced access skew, and file access latency.

Replication Strategy Basic Details
Replication Strategy Advantage and Disadvantages [52] Year 2019 Description: A data replication strategy for MongoDB. The main aim is to provide the performance requirement for the tenants, while the provider's profit is not ignored.
Advantages: Decreased response time, Resource consumption, and number of replications.
Disadvantages: Low load balancing. [112] Year 2018 Description: A predictive approach, namely (PredRep), which is used to characterize the cloud database system workload and automatically provide or reduce resources based on the cost factor and SLA agreement.
Advantages: Reduced cost and SLA violations.

[3] Year 2017
Description: A data replication strategy for cloud systems, namely (DPRS), which uses the number of requests and free storage space to determine the number of replicas along with a suitable placement site.
Advantages: Low response time, Enhanced storage space, and Effective network usage.
Disadvantages: Low reliability. [113] Year 2016 Description: A replica replacement strategy that considers the data file availability, the last time the replica was accessed, access number, and the replica size. The replication not only provides load balancing but also maintained the performance.
Advantages: Increased Performance and Load balancing, less storage usage.
Disadvantages: Missing real time Implementation. [114] Year 2018 Description: A dynamic adaptive replica strategy, namely (DARS), which uses node's overheating similarity to provide the replica creation time, the replica creation opportune moment and locate optimal replica placement node.
Advantages: Superior performance and Better load balance.
Disadvantages: Lower access delay. [115] Year 2020 Description: A data Replication Strategy (RSPC) that satisfies both performance and minimum availability tenant objectives while ensuring an economic profit for the provider in Cloud datacenters.
Advantages: Reduced resource consumption, Reduced Costs of provider (penalty and data transfer costs) Disadvantages: Missing real-time cloud implementation and consistency consideration. [116] Year 2019 Description: A cost-based dynamic replication strategy (DRAPP) that uses the least number of replicas for simultaneous availability of data and performance tenant requirements in regard while considering the tenant budget along with a profit of provider. While dealing with tenant budget, query scheduling is done in such a way that replicas effectively obey load balancing.
Advantages: Reduced query response time and increased availability.
Disadvantages: Missing real-time cloud implementation and energy consumption consideration. [117] Year 2018 Description: A cost-based data replication strategy (PEPRv2) for cloud-based systems that effectively satisfies the response time objective (RTO) for executing queries while simultaneously benefiting the provider to return a profit from each execution. It simultaneously satisfies both the SLA terms and profit of the provider. The SLA includes the availability and performance along with maintaining the query load as per the provider's profit.
Advantages: Reduced response time, bandwidth consumption, and monetary expenditure.
Disadvantages: Missing real-time cloud implementation.
In this section, we propose a taxonomy of target-oriented replication strategies based on target objective classification (shown in Figure 4). We classify these target-oriented replication strategies into nine key target objectives based on their attributes, namely  There is always a conflict between each targeted objective of replication strategies. For example, costs are inversely proportional to access time and performance. In fact, due to the different nature of each target objective, most of the replication strategies simultaneously do not satisfy the multiple target objectives. Each target-oriented replication strategy aims to satisfy a specific target objective to enhance the performance directly or indirectly. The first and crucial target objective always aims to improve the data availability, which is a must for accessibility and disaster recovery. The other important target objectives include increasing fault tolerance and throughput, providing reliability, scalability, elasticity, ensuring load balancing, decreasing response time, and security. In the future, a hybrid multi-objective replication approach can be planned and designed, like in Reference [73], which will possess the mixed capabilities of all target objectives.

Target Objectives of Target-Oriented Replication Strategies
In this section, different dynamic replication techniques related to their concern target objectives are explained in detail.

Availability
Availability is the readiness for correct service of a system [101] that guarantees that an item (data or service) is functioning at a given instance of time under defined conditions. Data availability has been always a hot topic and a big factor in the field of distributed environments that promises to improve the data or service available to the users for a better quality of service. Even if there are not any disasters, the data availability should be considered as the primary concern for the organizations for accessibility and smooth functioning. This is the reason why it is considered as a (Primary) main target objective for the replication strategies in the cloud. In all distributed database environments, and especially in cloud computing, the replication strategies target improving the availability of data. The replicating services always guarantee the availability of services in case of disasters. The large-scale distributed storage systems use replication strategies regularly to improve the data or service available to users. The two metrics which affect the data available in these types of setups are the number of replicas and the location of the replicas [118]. Availability is directly proportional to its reliability. There are many other metrics which affect data availability and must be addressed utmost. They include network link failures issues, replica allocation, and many more.

Reliability
Reliability [102] aims to give a correct or acceptable result within a time-bounded environment. Data reliability is an important concern in distributed environments. Many efforts have been used to improve the data reliability for the storage distributed environments. High reliability is always another main target objective for cloud storage systems. Replication strategies have multipurpose efficiency on data reliability and availability. As the number of replicas(availability) increases, there are more chances that the user's request will be serviced faster and hence more reliable will be the system. The metrics which affect the data reliability in distributed setups are the disk failure rate issues, number of replicas, and response time, which keeps on increasing with an increasing number of tasks [31]. Many other metrics affect data reliability and must be addressed to the utmost. They include data missing rate, storage cost consumption, and effective data replica schemes for decent reliability [102]. Various research has been done in the respective field [119], and the work includes reliability issues of large-scale storage systems and provides a desirable solution for them.

Performance
Replication is an effective way to increase performance in a cloud computing environment by completing service requests from various users. Performance represents the effectiveness of the system [73]. The data storage must be in a strong condition to strengthen fast and strong data access, update management, and should provide recovery facilities. The performance in large-scale cloud storage systems is always considered as one the important topic and major target objectives to be addressed. Nevertheless, availability increases the performance of data in a distributed environment. Moreover, the replication strategies have multipurpose competence in data availability, data reliability, load balancing and response latency [120]. System performance must be achieved at an acceptable cost. The performance is computed in terms of throughput, response time, latency, and so on which also displays the quality of the service. The metrics which affect the data performance include: (1) Response time-time is taken by a system to respond to a service request, which should be low; (2) Throughput-number of service requests served at a given time, which should be high; (3) Latency-time delay of a client request and to its service providers response in the cloud; and (4) Execution time-service time to process the sequence of activities [73]. There are many other metrics that affect the performance and must be addressed at the utmost. They include the number of replicas which is directly proportional to availability and mostly enriches the performance.

Fault Tolerance
The stored data must have the option to recover if there is any occurrence or prediction of failures in one machine, which means the system should provide a backup instance of the application (data is still available on another machine on the network) that will commence or is expected to start without interruption [121]. Hence, fault tolerance techniques minimize the failure effect on the computing environment. Fault tolerance in cloud computing improves reliability, availability, recovery from failure, lower cost, improves performance metrics, etc. More chances of failures arise because of the dynamic behavior of cloud or distributed environments. To overcome such effects of these failures, the cloud should implement fault tolerance aggressively, which is always a crucial target objective to be considered while choosing or developing a replication strategy [122]. Replication increases the fault-tolerant by introducing a balance between consistency and performance during update scenarios. We need to have minimum latency for an efficient fault tolerance [121]. Hence, low latency (network delay), service time, and fewer overheads are the metrics of fault tolerance. Another metric can be the number of replicas, which needs to be in control to maintain the fault tolerance [123]. Fault tolerance provides resilience to the cloud-based replication strategies.

Load Balancing
Load balancing is one of the central target objectives for data replication in cloud computing. In a distributed system, load balancing is the process of distributing and balancing the dynamic local workload (memory capacity, delay, or network load) among various nodes (available replicas) to maintain resource utilization and achieve higher job response time [79]. Replication strategies show multipurpose efficiency on load balancing. It improves the overall performance of the system. It utilizes the available resources hence reduces the resource consumption. It also helps to implement fail-over, provide scalability and avoid the performance bottlenecks [79,124]. The metrics which affect the load balance in this distributed computing include response time, request loss rate, optimal number of copies, and the storage [120,125].

Scalability
Scalability is another crucial target objective which needs to be addressed for optimal replication on cloud. Scalability is a capability of a system to handle the increasing demand for computational resources to accommodate the growth [90]. Scalability enhances the replication [126]. The requirements of cloud computing are scalability with large data set operations [90], resulting in increasing the performance using over-provisioning of the resources [127]. The data on storage systems needs prompt scale to cover the increasing workload demands by providing the provision to horizontal or vertical expansions [128]. Many of the cloud base applications rely upon data-replication to achieve better performance, availability, scalability, and reliability [129]. Elasticity is an extended version of scalability.

Elasticity
Elasticity is one more important target objective used to face the changing conditions during the replication of clouds. Elasticity is the capacity to expand or shrink, the number of replicas to adjust to the incoming increasing or decreasing workload [130]. Using elasticity, additional computational resources can be acquired, or released automatically (resources provisioned to their applications) based on demand (dynamic workload) to minimize the resource cost and filling the Quality of Service (QoS) requirements. Autoscaling is another name for elasticity. However, overprovisioning causes resource wastage and extra monetary cost, while under-provisioning leads to performance degradation and violation of service-level agreement (SLA) [131]. So, while developing an elastic replication strategy, there should be utmost consideration on over-provisioning and underprovisioning circumstances.

Consistency
Consistency of replica placed is one of the important and crucial parameters which needs to be addressed for the optimal replication strategy on the cloud. Using data replication strategies, a data-intensive application can accomplish fault tolerance, improved availability, and data recovery [8]. There are many techniques used to enhance the consistency of replication on the cloud. In distributed systems (cloud), the data consistency is described as a mutual deal between data availability and partition tolerance in the CAP theorem (Brewer's theorem) [132]. The CAP theorem mentions that, out of three properties, the only properties can be accomplished at the same time inside an appropriate framework [132]. In this regard, the consistency alludes to the prerequisite that the clients should neither feel or be aware of working on a single node, nor should they be aware of the number of replicas used or assigned to them.

Cost
Cost is one of the important target objectives of replication strategies is the cost. The costs associated with replication strategies can be a storage cost or data transfer costs (Replication Cost) [115]. The preference must be given to the for economic reasons and for choosing a replication strategy. The cost of replication of a data file is different in different data centers and keeping in view the heterogeneous nature of the system, the cost of replication, availability, and performance should be contemplated together for optimal replication [107]. The metrics which affect the cost in cloud-based replication strategies include data moment, and cost of data transfer, dataset dependency, access frequency, storage capacities of data centers, and size of datasets in the build-time stage. The optimized data placement strategies can reduce the data movement and save data transfer costs among different data centers [99].
Various research has been done for the cost and its effective utilization in cloud systems; some include the electricity price-aware consideration [133], some include the replication cost-related efficiency, some of the works [134] address the storage space limitations, and some address general monetary costs of replication [135]. In recent times, the monetary costs if considered with tenant and providers profit had become a trend due to its nature of benefiting both parties (tenant and provider). These monetary-based replications strategies have been classified into provider-centric and consumer-centric strategies, both primarily focusing on the service providers profit and tenants' profit [115].
In all target-oriented replication strategies, QoS should integrate all the above-mentioned objectives (availability, reliability, performance, fault tolerance, load balancing, scalability, elasticity, consistency, and cost) to achieve the highest level of target achievement for optimal replication. The SLA contract represents the agreement between a service provider and its customers (agreed-upon guarantees) to guarantee assurance [6,136] to support the basic objectives like data availability, enhanced reliability, performance, etc. [137]. Furthermore, the service provider does not satisfy the performance levels due to the inherent network latency of the Internet. User expectation of QoS is always high, so it is mandatory to address the basic and architectural issue, in particular, what will happen and who is responsible, as well as set the tolerance level of business processes [90].

Target Objectives and Their Relationship with Parameters
Each replication strategy can address one or more target objectives and each targeted objective is composed of one or more attributes (parameters). Different replication strategies cover different parameters based on the target objectives. These attributes act as important metrics for the evaluation of the replication strategies in the cloud. In Table 4, various target objectives and their attributes are evaluated. Table 4. Represents various target-oriented replication strategies with their target objectives based on their attributes, purpose, and metrics.  The replica no. is the availability related attribute, which is based on a mathematical model to maintain the number of replicas and availability requirement accordingly. The execution rate, response time and bandwidth consumption are the performance-related attributes, and they are reduced because of balanced replica placement. The replica placement is the load balance-related attribute, which is achieved by placing the most popular data files based on access history (access information of data centers). Note:

Replication Strategy Target Objectives (Priority Based) Attributes (Parameters)-Metrics
The key investigating parameters are Data Availability, Number of Replicas, Response Time, Execution Rate, and Bandwidth.  The storage cost is the performance-related attribute, which is based on the least number of replicas required for a proper availability, the data file is selected on the basis of access intensity, the higher SBER, better response time, and cost of replication.
The system byte effective rate, bandwidth consumption, and the response time are the availability related attributes. Note: The key investigating parameters are data file availability, Average File Probability, cost of replication, data file availability, system byte effective rate, and the cost of the replication.
[108] Year 2013 1. Performance (Primary Target Objective) The response time is the performance-related attribute. When the response time is longer than the threshold, the replica number will increase; hence, the system will create a new replica. In addition, other related attributes are network utilization, average job time high rapid download and low energy consumption. Based on the new request, the bandwidth is predicted for replica selection. Note: The key investigating parameters are replica creation, Replica selection, and Replica placement. The replica no. is the availability related attribute. The number of replicas is considered as system byte effective rate and is calculated as the number of bytes available to total bytes requested by all tasks. The system byte effective rate is performed in the second stage of the Modified D2RS algorithm stage which is best suited for varied periods. The execution time is the performance related attribute, which increases the performance. Execution time is increased by creating a replica of the data in the data center. The popularity degree is the access frequency based on time factor and user activity. Note: The key investigating parameters are Data Availability, Number of Replicas, Execution Time, and Access Frequency. The replica no. is the availability-related attribute, which is based on the demands of the users and the availability of storage. It chooses the optimal replica selection and placement for the availability purpose based on response time and access time. Note: The key investigating parameters are Data Availability, Access Time, and Response Time. [111] Year 2015 1. Reliability (Primary Target Objective) The replica placement is the reliability-related attribute, which improves reliability and reduces access skew. The reliability is achieved through access latency (decreased file access latency). Note: The key investigating parameters are Data Availability, Access Latency, and Replica Placement.

. Load balancing
The replacement strategy is the performance-related attribute, which is based on the availability of the file, the last time the replica was requested, the number of access, and the size of the replica. Other performance-related attributes include cost, which relies on as storage size of each site, which is kept limited by just keeping the important data only. The replica placement policy is the load balancing related attribute, which allows storing replicas in the relevant sites based on five parameters (failure probability, storage usage, mean service time, latency, and load variance). Both Performance and Load Balancing related attribute target to increase the response time and cost-effective availability. Note: The key investigating parameters are mean Response time, Load balancing, Effective network usage, Replication frequency, and Storage usage. The replica creation time and opportune moment are the performance related attribute, which is based on the node's overheating similarity. They find the optimal placement node using the fuzzy clustering analysis method, and then the replicas are created by node using a decentralized self-adaptive manner. The optimal placement node is the load balancing related attribute, which is found from the neighborhood. The optimal placement node improves the probability of replica to be accessed, relieves the overloaded high node degree, possess low node load, reduces the access delay, and boosts the load balance. Low access delay and acceptable load balance are achieved by reducing the node response latency. Hence, low access delay is based on operation time and the ratio of request versus response. Note: The key investigating parameters are Low access delay, Ratio of average load, Node response latency, and Accessing pressure.  The response time of the query is the cost-related attribute. The execution of any particular query is estimated and compared with service level objectives (service quality) that the tenant expects from the provider along with profit estimation. It also decreased the number of replicas for a given availability. Note: The key investigating parameters are Response Time, storage usage Network bandwidth consumption, and Cost.

Quantitative Analysis of Target-Oriented Replication Strategies
Data replication in distributed file systems (clouds) is a technique to store the data (replicas) on multiple servers across multiple data centers with the main aim to improve data availability during failures. The other advantages of replication are to improve the response time, bandwidth consumption, reliability, job performance, throughput, less frequency, reduce data access latency, decrease data transfer amounts, and the costs [138,139].
The focus of each target-oriented replication strategy is to satisfy a specific target objective following its prescribed matrices to increase the overall performance. We have observed that several dynamic replication strategies discussed in this article trend to address most of the primary addressed target objects (most addressed). Some of the dynamic replication strategies address secondary target objectives (average addressed), and a few of the dynamic replication strategies address tertiary target objectives (least addressed).
From Table 5, we have observed that some of the strategies, like Refs. [3,52], are included in the fault tolerance category and are also included in the performance category; the same had happened with Refs. [100,102], which are included in both fault tolerance and reliability section. There are other, same examples in Refs. [52,101,113,114], found in the performance and availability category, and also Reference [112] is included in scalability, elasticity, and cost category because all these strategies are addressing primary and secondary target objectives in a single replication strategy, with each target objective having its priority. In other words, these strategies address both categories of target objectives simultaneously.

Performance Evaluation of Target-Oriented Replication Strategies: Comparison and Evaluation
Here, we provide a complete and detailed survey for the target-oriented replication strategies in cloud with their attribute status and explanation, as depicted in Table 6. Table 7 shows a summarized form of features included in all target-oriented replication strategies. In this section, we compare and evaluate the reviewed target-oriented replication strategies according to their features. These features are represented, along with their intensities, as LW for Low, MD for Medium, HG for High, IN for increased, NA for not addressed, YS for yes addressed, and NC for No Change.

Performance Evaluation Understanding
In our research, we included a total of 22 different target-oriented replication strategies (2011 to 2019) in cloud domain (shown in Figure 5), and each strategy is addressing a specific target objective, or several, by either addressing the one attribute or many attributes. We have observed that primary addressed target objects (most addressed target objective), which include availability, reliability, and performance, are covering total of 80 percent, and rest of the target objectives, which include the secondary target objectives (average addressed target objective), including fault tolerance, and load balancing and the tertiary objectives (least addressed target objective), consists of scalability, elasticity, consistency, and cost. covers the rest 20 percent. The elasticity covers the 5%, and the consistency covers the 15%. The others target objectives, like fault tolerance, load balance, scalability, and cost are addressed indirectly, along with directly addressed target objectives. In future research, we recommend that least target objectives should be addressed with primary target objectives in a single replication, e.g., scalability should be considered with availability. Moreover, efforts should be made to develop a dynamic replication strategy that should address almost all (most addressed, average addressed, and least addressed) target the objective, altogether, in one algorithm. The detailed overview of all strategies included in this research paper is represented through Figure 6 (pie chart of quantitative analysis of target objectives). The functional metrics included in this work are the previously used performance metrics of cloud data replication and management for cloud systems [7]. Indeed, for the best optimization, the metrics discussed should contribute to increasing the overall performance by addressing many parameters of target objectives. The prime target includes the system availability, which is always a key factor for the overall enhancement and optimization. For a better system availability, the frequently accessed data is distributed to multiple suitable locations, from which the users can access the data from a nearby site [140].
In the future, these so-called metrics or the target objectives of target-oriented replication strategies in cloud computing strategies should also contribute to improving the security of the dynamic replication strategies, like in Ref. [141], because, indirectly, the security can lead to a data loss situation.

Challenges for Replication Strategies in Clouds
The main issues of replication revolve between data availability, cost, and performance. The frequently used data should get replicated to multiple locations to increase the data availability and enhance the performance; this will make a smooth way for the users to accessing from their nearby sites [54]. The other issues and challenges include data consistency, downtime during new replica creation, maintenance overhead, and lower performance [34].
Some of the latest work in the field of replication includes Refs. [142,143].

Challenges of Dynamic Replication Strategies in Clouds
Cloud replication primarily aims to increase the resource availability, reduce the delay time, minimize the access cost, and share the bandwidth consumption. During dynamic replication, decisions are made based on the resource availability and current access patterns.
There are two major issues in a replication which include: which data to replicate (replica selection) and where to place (replicas placement) [144]. Besides these two major issues in replication, there are also two other related issues, such as when to replicate (Replica time) and how many numbers of replicas to replicate (replica quantity). These other issues are as important as that of two major issues in replication [145]. Hence, the total of four important issues of any data replication strategy is determined as (1) what data should be replicated, (2) where to place a new replica, (3) when a replica should be created or deleted, and (4) how many replicas to create [52].
Some of the latest work in the field of dynamic replication includes Refs. [146,147].

Replica Selection
One of the major issues in cloud-based replications is replica selection. To meet the user requirement, such as to reduce the waiting time and increase the data access, replica selection must be addressed in cloud replications effectively. In adverse conditions, if early replication of a data file is done, or if the replica selection is not done efficiently, both conditions will lead unnecessary utilizing an extra storage space consumption and will increase its associated storage cost.
The available solutions can be the selection of a particular popular data [148] or the selection of data having the relatively higher reliability and longer storage duration, or we can try any light-weight time series prediction technique [31] to overcome the hurdles.

Replica Placement
One more vital issue in cloud-based replications is replica placement. The replica factor is one of the key factors of replica placement. Replica placement promotes data availability and service quality. The main two issues in the replica placement are how to determine the replica factor and how to select the optimal data node for storage of replica. Replica placement algorithm are categorized into two basic types: as static replica placement algorithm and dynamic replica placement algorithm. Static replica placement algorithm generates replica and selects data node at the initialization of the cloud storage system. These algorithms are easy to deploy, while dynamic replica algorithm selects the optimal data node dynamically to store replica based on current available data. These algorithms cannot be easily deployed [111,149].
To decide and address the issue of where to place the replica is a vital part and a crucial point in cloud computing architectures. As a solution, we can aim and stress on file access history used, which is a readily available solution.
The available solution for replica placement issues of different replication strategies uses blocking probability technique, which is used by paper [84], access information of data node technique used by reference [101], and heuristic search algorithm used by Refs. [31,104]. In general, we need to determine the best location and reduce the access latency factor for efficient replica management [150].
Many related surveys and open research issues are mentioned in Ref. [151], and some solutions for replica selection and replica placement in cloud setups are mentioned in Refs. [140,148,152].

Replica Time
Another important crucial issue in cloud-based replications to be addressed is when to replicate. The selection of proper time not only enhances the availability but also reduces the cost of the storage indirectly. The replica selection and replica time should correlate for an efficient output which includes the data availability, low-cost storage, and reliability. For an efficient cloud computing-based replication, this factor must be addressed at the utmost.
The available solution used in many cloud-based replications are mostly based on threshold achievement. These solutions can be either (1) the right time to replicate data is when the access frequency is greater the threshold, or (2) the right time to replicate data is when replica creation time point is reached, or (3) the right time to replicate data is when popularity exceeds the threshold, or (4) the right time to replicate data is when the original copy does not meet the user-specified reliability requirement, or (5) the right time to replicate data is when the replication factor is less than the specified threshold [145].

Replica Quantity
One more crucial factor in cloud-based replications is the replica quantity. Besides meeting the system availability, reliability requirement, and the cost of replica maintenance, one of the important issues to be addressed in cloud replications is how many numbers of replicas to replicate because, after a certain period of time, increasing the number of replicas does not increase the availability but might bring the unnecessary consumption of storage space, hence increasing the cost of storage. According to papers [84,150], it is very important to decide the replica quantity for cost-effectiveness purposes.
The available solutions used include mathematical model (built on the concept of theory of temporal locality, which states that there is a probability in the future that most recently accessed data file will be accessed again) to capture the relationship between the availability requirement and the number of replicas; another solution includes storage duration, the number of replicas and user-specified reliability requirement, and a few include the numbers of replicas to be calculated by a parameter smoothening factor [106,145]. Figure 7 depicts the issues and future research directions of dynamic replication strategies in clouds.

Least Addressed Target Objective of Target-Oriented Replication Strategies in Clouds, Their Challenges, Issues, and Future Research Directions
In this section, we discuss challenges, issues, and future research directions of tertiary addressed target objective one-by-one: Tertiary addressed target objective are the target objectives which are least addressed in all replication strategies. They are mentioned below as:

Scalability: Challenges and Issues
Due to the huge scale of the data stored in the data centers, there is always a need for quick scale to meet the workload demands. These huge data centers on a distributed setup are more prone to failures. Therefore, distributed cloud resources need to be efficiently utilized to minimize the costs associated with the storage and to maintain communication of these applications effectively along with the data availability. The replica locations and the associated communication cost are always a big concern for the replication strategies on the cloud computing paradigms.
For smoothness of storage, the cost-effectiveness and accommodating load spikes are considered as a big challenge. Furthermore, resource utilization must be adaptive for the flexibility of resource availability, for the flexibility to the addition of new resources, for the flexibility in case of load variations and for the distribution of client locations [66,153].
Hence, scalability is always considered an important metric that must be stressfully addressed by all replication algorithms. There are various factors which effects the scalability. The most important factor includes the architectural to be chosen. The architecture to be chosen for the replication plays an important role for the success of data replication. However, different architectural models (grid or cloud or other) possess various levels of scalability, which means that scalability is more dependent on the model, rather than replication algorithm.

Future Research Directions for Scalability
An analytical study shows that scalability depends more on the architecture model (grid or cloud or other) rather than the replication algorithm [66]. Different architectural models support different levels of scalability. Therefore, while targeting performance through scalability, we need to choose the architectural models of the cloud for the replication strategies.
Another way to improve the scalability of replication strategies on cloud is to use asymmetric processing, in which the transactions are initially processed at the originating location sites and then are collectively and eventually propagated to other sites, while, in symmetric processing, the updates are sent and executed at all replicated sites [154]. Some of the latest work in scalability include Refs. [155,156].

Elasticity: Challenges and Issues
Elasticity is the capability to expand (scale up) and shrink (scale down) the number of replicas according to incoming load. The resource provisioning issue is one of the biggest issues in distributed computing configurations [157], especially when we talk about the dynamic workload and dynamic environments. The available solutions include proactive and reactive approaches [112]. During high workloads, the data storage must be able to expand with increasing load hike and also adjust to shrink during low load by releasing the unutilized cloud resources [128]. Elasticity and scalability objective are two interrelated terms, where the latter allows the shrinking concept of the resources besides the expansion.

Future Research Directions for Elasticity
While addressing the elasticity, the researchers should include the scenarios of busy workloads and should adopt different forecast methods. Then, only the improved performance and low-cost results can be achieved. We can try new scenarios, including the load balancing objectives, along with elasticity. We can stress adaptively using more virtual machines. Another way to increase the performance is by using the SLA protocols, along with elasticity using cost-effective approaches [112]. In future, researchers can plan to extend the elasticity with queuing theory-based model, "where the server is treated as a queuing framework and its theoretic results are used to derive a relationship between the request rate, service times of requests, and the response time SLA", for the estimation of capacity regarding provisioning on the cloud [158]. Some of the latest work in elasticity include Refs. [159,160].

Consistency: Challenges and Issues
Maintaining consistency will enhance the replication strategies to a great extent. The primary importance should be always given to data integrity and consistency in a replicated domain for high performance. There is always a requirement for a strict consistency and strict consistency is the need of a high precision applications [34]. A consistency model in distributed domain figures out which guarantees can be expected for an update operation and for accessing an updated object. Its open challenges in cloud computing architecture to obtaining the correct balance between higher levels of consistency and availability [128]. However, more replication increases inconsistent replicas and strong replication (traditional synchronous) has its restrictions because of deficient performance and latency. Another important factor to hinder the strong replication in clouds is its geographically distant factor. Moreover, frequent data updates occur in clouds, which makes it burdensome to maintain the consistency of the replicas among the entire cloud [26].

Future Research Directions for Consistency
To achieve a strong consistency in cloud dispenses higher downtime because latencies become more prominent with strong consistency. Strong consistency is expensive not just in the transactional cost but also in terms of replicas availability and system performance [105]. Consequently, cloud storage systems have moved to eventual consistency (all replicas eventually receive all writes). The major advantages of eventual consistency are performance, high availability and still provides a good enough consistency guarantee for production systems. However, to maintain both the availability and the performance following consistency is too costly. As the number of users increases (more users deliver more updates) on the cloud, there will be more stale data (probably two out of three reads are useless) which gradually decrease the performance. Therefore, there is a strong need to maintain high availability and consistency while not degrading the performance [26]. In this regard, the researchers should pay attention to maintain a balance between consistency, availability, and performance using adaptive methods [105,106] for better consistency. Some of the latest work in consistency include Refs. [161,162].

Cost: Challenges and Issues
While talking about the economic aspect of the replication strategies in cloud systems, cost plays a vital role as it acts as the most important objective while choosing any replication strategy. Desirable System performance must always be obtained at an acceptable cost [73]. There are various types of costs associated with replication strategies. Some of them are related to storage (data storage or data transfer costs), which rely on replica time, replica quantity, replica selection, and data movement [9,83,95], some are related to QoS and are based on mutual agreements [93], and some are related to monetary cost while considering tenet and providers benefits [115].

Future Research Directions for Cost
Cost is a very important attribute which needs further discussion due to its direct effect on sustainability and economic aspects of cloud systems. Cost and its utilization in high processing systems should be given a prime priority as all types of replication costs are directly associated with end-users and their service provider. Perhaps, the replication cost should increase provider and users benefits for performance guarantee. There is a strong demand to utilize lesser replication costs while not degrading the performance.
One of the future directions can be balancing an optimal number of tenants through the pay as you go, model, while satisfying the response time attribute resulting in an optimal profit for the provider [115]. The other future direction can be the implementation of these cost-based strategies of replications while taking into consideration of the energy consumption [116].
Reference [163] has mentioned some of the valuable future directions. Besides all enhancements, these cost-based replication strategies should be implemented in a real cloud environment [116,117]. Some of the latest work in cost include Refs. [164,165]. Figure 8 depicts the future research directions of least addressed target objective replication strategies in clouds.

Discussion
As proposed in Section 4, the target based dynamic replication taxonomy in cloud configuration provides the depth in understanding the target objectives in the form of Primary target objective, Secondary target objective, and Tertiary target objectives based on most addressed, average addressed and least addressed objectives. Most of the targets of target-oriented replication strategies have concentrated on various target objectives, such as data availability, followed by reliability, and performance. These three target objectives are called primary target-based objectives as they are mostly addressed. The taxonomy also mentioned the secondary target objectives, such as fault tolerance and load balancing, as they are average addressed and additionally also mentioned the tertiary based objectives, such as scalability, elasticity, consistency, and cost, as they are least addressed. Table 4 represents the relationship of various dynamic replication strategies with their target objectives based on their attributes, purpose, and metrics. These attributes act as important metrics for the evaluation of target-oriented replication strategies in the cloud. Distinct target-oriented replication strategies cover different parameters based on their target objectives (either directly or indirectly). These attributes act as vital metrics for the evaluation of target-oriented replication strategies in the cloud. Table 5 represents the quantitative analysis of all target objectives in detail in the form of summary of all target objectives. Table 3 represents the literature review of target-oriented replication strategies.
In Section 5, a complete performance evaluation of different target objectives was performed in detail, along with feature comparison. We provide a comparative analysis and evaluation of various strategy in cloud computing environment, shown in Table 6 and Figure 6. Table 6 shows how various research papers have considered different parameters and discuss the impact of each strategies on target objectives. After reviewing the various target-oriented replication strategies comprehensively, it can be stated that different strategies have considered different metrics for evaluation. Concerning to target objectives, each strategy may consider one or multiple targets. Some of the strategies have considered a single target objective while some have included multiple target objectives for their metrics. Table 7 shows the feature of each respective target objective replication strategies with the intensities.

Conclusions
Replication strategies have been widely adopted in current cloud systems for data availability, reliability, and performance. The adaptation improves system resilience during disasters without any downtime. The cloud replication strategy trend to preserve the geographically distributed huge data, hence, creates the need for optimal replication strategy for acceptable performance. We filter out the dynamic replication strategies and evaluate their optimization capabilities based on quantitative analysis of target objectives (Primary target objective, Secondary target objective, and Tertiary target objective) using different attributes that are addressed. We provide a critical quantitative analysis and a comprehensive performance evaluation based on target objectives. We perform a comparative parameter evaluation, along with the metrics comparison. The paper also discusses the challenges, issues, and future research directions. This study will be beneficial to researchers to identify the research problems of replication strategies in cloud computing configuration and will provide a depth in detail related to available dynamic replication strategies and target-oriented replication strategies. This research will open a new gate to develop the optimal dynamic replication strategy for clouds in the future.