Providing Predictable Quality of Service in a Cloud-Based Web System

: Cloud-computing web systems and services revolutionized the web. Nowadays, they are the most important part of the Internet. Cloud-computing systems provide the opportunity for businesses to undergo digital transformation in order to improve efﬁciency and reduce costs. The sudden shutdown of schools and ofﬁces during the pandemic of Covid 19 signiﬁcantly increased the demand for cloud solutions. Load balancing and sharing mechanisms are implemented in order to reduce the costs and increase the quality of web service. The usage of those methods with adaptive intelligent algorithms can deliver the highest and a predictable quality of service. In this article, a new HTTP request-distribution method in a two-layer architecture of a cluster-based web system is presented. This method allows for the provision of efﬁcient processing and predictable quality by servicing requests in adopted time constraints. The proposed decision algorithms utilize fuzzy-neural models allowing service times to be estimated. This article provides a description of this new solution. It also contains the results of experiments in which the proposed method is compared with other intelligent approaches such as Fuzzy-Neural Request Distribution, and distribution methods often used in production systems.


Introduction
Nowadays, one of the most complex systems built by humans is the Internet network which is composed of numerous resources, web content, access devices, and a large number of users. Over the last decade, the number of active Internet users has increased to 4.66 billion. Almost 91% of World Wide Web (WWW) users use mobile devices as well as mobile phone networks and this percentage is expected to increase in the future [1].
To ensure effective and fast access to web content, many companies use cloudcomputing solutions. This has opened new opportunities for providing large-scale computing resources [2]. The global cloud-computing market size is expected to grow from USD 371.4 billion in 2020 to USD 831 billion by 2025. The sudden shutdown of offices, enterprises, and schools during the Covid 19 pandemic has increased the demand for cloud solutions and services. It seems that many solutions developed during the pandemic will remain, especially in the field of the home office. However, this requires further continuous transfer of solutions known from everyday life to the Internet platform [3].
Cloud-computing systems enable access to a shared pool of computing resources such as servers, storage, applications, and services delivered on-demand [4,5]. The organization of resources improves the performance, utilization of resources, and energy consumption management and helps to avoid SLA (Service-Level Agreement) violations [6].
The solutions used in cloud computing mean that their application costs are lower compared to building server infrastructures in enterprises. Nevertheless, cloud-computing systems do not provide access to unlimited resources from the client's point of view, mostly due to the costs of the use. In order to reduce the costs and increase the quality of service mostly due to the costs of the use. In order to reduce the costs and increase the quality of service (QoS), or especially the quality of web service (QoWS), load balancing and load sharing mechanisms are implemented. The proper and effective distribution of the load is still an open problem in cloud computing. Little comprehensive research in the field of cloud computing has been done. New architecture structures and algorithms meeting clients' demands need to be developed.
The nodes in the cloud-based web system are distributed. There are two types of load sharing techniques related to the centralization of decision units [7]-centralized and distributed. In centralized systems, the distribution decisions are made by a single node. This node stores knowledge of the whole cloud system and uses load sharing algorithms. Distributed and hierarchical load sharing involves different layers of the cloud. There is no single node fully responsible for making distribution decisions.
The complexity of cloud infrastructures has created the need for innovative monitoring approaches [8]. The complex nature, at many different levels, of this kind of system is the underlying cause of many problems. This has an effect on all aspects of cloud operation and needs to be handled properly in order to provide high performance, dependability, and quality of service etc. [9].
Most cloud systems are physically divided into regions (server rooms) placed in different geographical locations. Regions are built of availability zones-parts of the regions with independent network infrastructure [10].
Web switches distribute the load in the cloud systems. Taking into account the architecture of a typical web cloud using only one region, a centralized cloud contains one web switch distributing the load all over the region. The switch is placed in one of the zones. It is called one-layer architecture (Figure 1a). A distributed hierarchical web cloud contains many web switches making decisions on different layers. Figure 1b presents a twolayer architecture with one web switch in the region and separate web switches in zones.  The quality of the web services can be assessed by the end-users in many different ways. An interesting content of the website ensures an increase in the number of users. However, it is crucial to deliver the content immediately after receiving a request. The users will consider the service to be of low quality when the time required to get the page is too long. Therefore, the quality of the web service is often evaluated based on the time required to receive the content. Time delays concerning fetching web content are related to the time of sending data via the Internet and the time of servicing HTTP requests by the web services, especially cloud-based web systems.
Based on predictability, the web services can be categorized as best-effort, predictable, and guaranteed [11]. Best-effort services provide no control over how the service will satisfy the user requirements. It means that the rest of the system (users and applications) will need to adapt to the service's state and the service will be unpredictable. In most cases, best-effort services do their best to service requests as fast as possible, but they cannot guarantee anything. Guaranteed service is the opposite of best-effort service. Guaranteed service is predictable and reliable. When service is not available, the provider must account for the loss of service. There are also predictable services. They are something between best-effort and guaranteed services. They provide some degree of predictability but do not require the accountability of guaranteed service.
The HTTP request distribution method in a two-layer architecture of a cluster-based web system is presented in the article. The method allows predictable QoWS to be provided by keeping the service times within adopted constraints and at the same time processing HTTP requests very efficiently. The method is called Web Cloud Earliest Deadline First (WCEDF). The main element designed to be used in the method is the WCEDF web switch. It allows the incoming requests to be queued by letting the time-consuming requests be serviced first. It additionally distributes requests in a way that keeps selected resources of the cloud idle and ready to service time-consuming requests faster. The presented method is another step towards the development of algorithms and methods of load distribution and sharing in cloud-computing systems.
The rest of the article is composed as follows: In Section 2, the related work is presented with a description of the previous works that constitute the basis for a new solution. Section 3 provides a description of the WCEDF method. In Section 4, the testbed and results of experiments and discussion are presented. Section 5 presents conclusions and the directions of future research. Section 6 summarizes the article.

Related Work and Motivations
Cloud computing uses distributed systems to achieve the very high demand for computational power and quality of service. Load balancing and sharing methods are used to distribute tasks and requests among nodes in the system [12]. Those methods help to reduce utilization of the resources, improve the response times, and enable scalability [13,14].
Web-based cloud systems that are publicly available for users usually use simple solutions as far as of load distribution is concerned. For example, Amazon Web Service (AWS)-the largest player on the cloud-computing market, provides the following algorithms in the web switches available for use [10]: Round-Robin, which is a carousel algorithm forwarding requests to subsequent servers, Last-Loaded that assigns HTTP requests to the nodes with the least loaded servers, and Path-based routing in which requested resources are assigned to given web servers.
Distribution strategies for cluster-based systems have been developed for a long time and nowadays, they are adapted to new structures and solutions of cloud-based systems. Load balancing and sharing strategies can be divided into three main categories [12,[15][16][17][18]-static, dynamic, and adaptive. In static load balancing, decisions are made using deterministic or probabilistic algorithms that do not take into account the current state of the system. Round-Robin and Path-based routing are a good example of such strategies.
In a dynamic approach, load balancing assignments are conducted based on the current state of the system. The most popular dynamic load balancing algorithm is Least Load.
The adaptive approach includes the most complex solutions. Decisions are made on the basis of the state of the system, however the strategy can change when the state of the system is changing [19]. The majority of the adaptive strategies are intelligent approaches. Some specialists [20] claim that only those approaches can effectively provide the acceptable service time in web traffic conditions, typically characterized by self-similarity and burstiness [21][22][23].
Many studies have been conducted on the application of artificial intelligence mechanisms in distribution strategies. Some of the algorithms use natural-phenomena-based strategies. A good example of such a strategy is the Artificial Bee Colony (ABC) whose decision mechanism imitates the behavior of bees [18]. Another strategy is Particle Swarm Optimization (PSO) which is based on an algorithm taking into account the costs of service and transmission [24,25]. Artificial neural networks also have been used in adaptive load distribution systems [26,27].
The problem of providing guaranteed services in web systems has been the topic of much recent research [28][29][30][31][32][33]. Most of the works focus on providing differentiated services for classes of clients using priority-based scheduling. However, proposed solutions allow the quality of the web services to be retained only for a specific group of users, and it is connected with rejecting requests of other users. There have only been a few works devoted to the problem of providing predictable services [34][35][36].
Many previously designed solutions providing guaranteed services for web systems can be applied in cloud-computing web systems. Unfortunately, most of them do not fit well to the architecture and the structure of the cloud. There have not been many works devoted to the problem of designing solutions for guaranteed and predictable services provided in cloud-based web systems [37][38][39][40].
The author's latest works were devoted to the problem of adapting solutions designed for cluster-based web systems to the specificity of work in the web cloud. Intelligent FNRD (Fuzzy-Neural Request-Distribution) web switches working in two layers of the cloud-the region and zones, have been used [41][42][43]. The web switches were using a neuro-fuzzy decision system estimating the HTTP request service times. The results of the conducted experiments revealed that the intelligent web brokers in the two-layer architecture cooperated and learned each other's behavior achieving very high efficiency of cloud-based web system in this way [44]. Taking into account the results and experience in designing predictable services in cluster-based systems [34,36] a new distribution method was proposed by the author. The proposed WCEDF method uses intelligent web switches and provides predictable services in a two-layer architecture of a cluster-based web system.

Web Cloud Earliest Deadline First Method and Web Switch
Servicing HTTP requests on a high level of quality in cloud-based web systems is crucial for many website owners. Requirements regarding the quality of web services can be different depending on the expectations of the users of the website. Presented in this article, the WCEDF method allows HTTP requests to be distributed in this way to keep the service time within the expected time limits. The web cloud system using the WCEDF method is devoted to work in a single region of the cloud and is composed of the following elements ( Figure 1b): • WCEDF web switch distributing HTTP requests among availability zones in the region; • FNRD web switches distributing HTTP requests among web servers in availability zones; • Frontend WWW servers and backend database servers, together servicing HTTP requests.
The most important element in the system is the WCEDF web switch receiving HTTP requests from clients (web browsers) and distributing them among availability zones in one region of the cloud system. The WCEDF web switch uses techniques of artificial intelligence to adapt to the changing environment and to learn the behavior of availability zones. Inside the zones, the intelligent FNRD web switches distribute HTTP requests among web servers.
As was mentioned before, previous research showed that FNRD web switches working in different layers cooperate with each other achieving very good results. In the WCEDF method, it is also desired to get the effect of cooperation of web switches and to additionally achieve a new goal in decision-making. In the proposed method, the WCEDF switch working on the region should cooperate with FNRD switches working on a zone layer.
The main aim of the proposed WCEDF switch is to make the cloud system service the request in a time that is no longer than the one adopted for a specific website. The adopted time boundaries to service the request have to be reasonable and possible to achieve even for time-consuming requests requiring more resources. It is also assumed that the system will not reject HTTP requests to guarantee the service time when the load of the web system is too high. In this case, the WCEDF web switch will do its best to make the time as short as possible.
It should be noticed that the design of the WCEDF switch does not include all features of productive web switches used in practical applications. It lacks solutions associated with security, resistance to cyber-attacks, and a failover system used when the web server or the web switch fails.
The proposed WCEDF web switch is composed of two main sections ( Figure 2). The first one, Scheduling Section, queues the incoming requests letting the time-consuming request to be serviced first. The second, Switching Section, distributes requests in a way that keeps selected availability zones idle, ready to service time-consuming requests faster. Both sections act independently, trying to achieve a similar goal, which is servicing the request in a time shorter than the adopted value t max . This value determines the level of quality of service which should be achieved for a specific web system. telligence to adapt to the changing environment and to learn the behavior of availability zones. Inside the zones, the intelligent FNRD web switches distribute HTTP requests among web servers.
As was mentioned before, previous research showed that FNRD web switches working in different layers cooperate with each other achieving very good results. In the WCEDF method, it is also desired to get the effect of cooperation of web switches and to additionally achieve a new goal in decision-making. In the proposed method, the WCEDF switch working on the region should cooperate with FNRD switches working on a zone layer.
The main aim of the proposed WCEDF switch is to make the cloud system service the request in a time that is no longer than the one adopted for a specific website. The adopted time boundaries to service the request have to be reasonable and possible to achieve even for time-consuming requests requiring more resources. It is also assumed that the system will not reject HTTP requests to guarantee the service time when the load of the web system is too high. In this case, the WCEDF web switch will do its best to make the time as short as possible.
It should be noticed that the design of the WCEDF switch does not include all features of productive web switches used in practical applications. It lacks solutions associated with security, resistance to cyber-attacks, and a failover system used when the web server or the web switch fails.
The proposed WCEDF web switch is composed of two main sections ( Figure 2). The first one, Scheduling Section, queues the incoming requests letting the time-consuming request to be serviced first. The second, Switching Section, distributes requests in a way that keeps selected availability zones idle, ready to service time-consuming requests faster. Both sections act independently, trying to achieve a similar goal, which is servicing the request in a time shorter than the adopted value . This value determines the level of quality of service which should be achieved for a specific web system.

Scheduling Section
The Scheduling Section queues and organizes the order of incoming requests in a way that lets the requests to be serviced in a time shorter than the expected time t max . The time should be achievable for a not-overloaded web system, even for time-consuming requests.
The scheduling section is composed of three modules-a request-analysis module, a service module, and a queue module.
The incoming HTTP request, r i (where i is the index of request and i = 1, . . . , I) is, on one hand, a physical HTTP request containing expressions compatible with HTTP protocol, and on the other hand, it is the set of information. In Figure 2, the request r i is marked by a double line, representing the physical flow of the request in the switch, and a single line representing carried information.
The request analysis module classifies incoming HTTP request r i to single class k i (k i ∈ {1, . . . , K}), which is a number representing requests having similar service times. Requests can be divided into two groups. The first consists of requests for static objects such as images, html, style css, or javascript files. The second group of requests (called dynamic requests) is connected with the dynamic content of responses produced at the moment of arrival of the request to the WWW server (by executing scripts on the server like PHP, Python, Java, or Net Core). As far as static requests are concerned, the classification is conducted on the basis of the requested files' size. In the case of dynamic requests, they are classified separately for each requested object, for example, taking into account the address of the object.
The queue module stores requests and puts them into the queue, Q i . The policy of the queue is EDF (Earliest Deadline First) and the requests are organized in accordance with deadlines d i−j , . . . , d i calculated in the service model for each request put into the queue. The deadline d i determines when the service of the request r i should begin on web servers. The request r i can leave the queue Q i if it is first in the queue and the number n i of the requests being serviced in the web system is not greater than n max . The value n max is a maximum number of requests serviced at the same time in the web system. If n i < n max then the request is not placed in the queue and is passed to the switching section and serviced in the web system.
Queueing requests at the front of the system has many advantages, but it also has some disadvantages. When we queue requests, we have the opportunity to reorder them in a way that lets the problematic request to be serviced first. Unfortunately, keeping requests in the queue prevents them from being serviced on the web servers. Therefore, it is very important to find the balance between the length of the queue and the number of requests being serviced in the rest of the system. The value n max determines both queue length and number of serviced requests. It is calculated for the web system as follows where N is the mean number of requests in the web system (the sum of queued and serviced requests) calculated periodically after servicing determined, large number, or requests. n min is a minimum value that can be assigned to the n max . In the experiments described in Section 4, it was obtained that n max = 500 and the mean value N and n max were recalculated after servicing each 10,000 requests. The service model calculates the deadline d i and a term d i . As mentioned earlier, the deadline d i points the time that the request r i should start to be serviced. The term d i is a moment when the service should be finished and the response should be sent to the client. It is calculated for every request as follows: is the moment of arrival of r i th request. The deadline d i is calculated only for requests being put into the queue module (not sent directly to the switching section) In order to calculate the deadlines, the service module stores information about service belonging to the kth class and is the most actual information at the moment of arrival of r i th request. The time t ki is always updated when a service of request belonging to kth class is finished and the previous request is stored in the queue module. The new time is calculated as follows: where t i is a service time of r i th request and η is adaptation ratio determining the speed of changes, η (0, 1). The time t i is measured by the switching section from the moment when the request is sent to the rest of the system to the moment when the response to the request arrives to the WCDF web switch. It should be noticed that time t ki is calculated for an almost constant load of the system equal to n max . However, because n max can change periodically, the adaptation ratio η should be big enough to let the vector U i adapt quickly to the new conditions. Preliminary experiments showed that η should be equal to 0.6.

Switching Section
After leaving the scheduling section, the request r i is passed to the switching section. This section distributes HTTP requests among web switches in the availability zones. The request is processed in this section as follows-first, the service time is estimated for each availability zone in the system, then the section makes a decision as to which zone should service the request. In most cases, the most loaded zone is chosen only if it is able to service the request within the time constraints. In the next step, the request is sent to the chosen zone. After finishing the service, the response is sent back to the switching section. The section measures the service time and load of the system. It also updates information about service times. In the end, the response is sent to the client.
The switching section contains three types of modules-zone model, decision, and execution modules. The number of zone model modules is equal to the number of availability zones in the cloud. Each zone model corresponds to one zone.
The zone model is responsible for estimating service timet w ki of r i th request for corresponding wth zone, where w [1, . . . , W]. The estimation is done based on the information of the load of the zone M w i = e w i , f w i , where e w i is the overall number of requests being currently serviced by the zone, and f w i is the number of serviced dynamic requests. The load M w i is collected by the executor module. The zone model adapts to the changing environment, after the request's service, by taking into account the measured service time t i , only if the corresponding zone was chosen to service the request. The model owes its adaptation capabilities to the use of a fuzzy-neural mechanism. The module's fuzzy logic structure is based on the Mamdani model [45], and it allows the calculation of the service time on the basis of the load, while the neural nature of the module permits it to tune its parameters. The structure of the presented model is described in detail in [41] and it is used in the WCEDF switch due to its advantages, including good quality of time estimation and the possibility of adaptation to the changing environment.
The structure of the neuro-fuzzy network used in the zone model is presented in Figure 3a. Superscripts indicating the executor's number (w) have been omitted in the figure as well as in the other markings in the rest of the description of the zone model.  The estimated service time is calculated as follows: The presented method of calculation of the service time in the zone is typical for fuzzy logic systems. The adaptation abilities are, however, connected with the neural nature of the zone module. The Back Propagation Method [46] is used to tune both the input and output fuzzy set parameters. The adaptation is processed each time the zone, corresponding to the zone model, finishes the request's service and the measured service time t i is passed from the execution module to the zone model.
The initial values of input fuzzy set function parameters C ki and D ki are evenly distributed over the space of executor operation. The output fuzzy set function parameters S ki are initially set to zero. In this way, the estimated service times are close to zero at the beginning of the operation of the web switch. When servicing subsequent requests, the zone model tunes the parameters and the estimated service times become longer and closer to the real service times. The rate of change depends on the adaptation ratios. The preliminary experiments allow their optimal values, η s = 0.01 andη c = η d = 0.4, to be determined. Furthermore, during the experiments, the optimal number of input fuzzy sets has been determined as L = M = 10, and the number of output fuzzy sets is equal to J = L · M [41].
The decision module is the key element of the switching section. It chooses the availability zone to service the request according to the algorithm designed especially for systems providing predictable service. The decision is made on the basis of estimated service timest 1 ki , . . . ,t w ki , . . .t W ki in the following way: where ∆d i is time remaining to service the request before reaching the deadline d i (calculated in the switching section's service model), and ∆d i = d i − τ (2) i , τ (2) i is the moment the request enters the scheduling section. According to the proposed formula, we chose the zone with the lowest index w, which offers service time that allows the request to be serviced before the deadline d i . In this way, in most cases, the zone with the lowest index w = 1 is chosen to service the request and the zone becomes the most loaded one. Each zone with the greater index w is less loaded, and it is also ready to service more demanding requests. If the zones are overloaded and none of them is able to offer satisfying service time, then we use a classical solution known from the FNRD method [41] and the zone offering the shortest service time is chosen.
The execution module is responsible for physically sending the requests through the network to availability zones. This module also receives responses and sends them back to clients. The second duty of the module is to collect information regarding requests being serviced in zones. The module measures service times t i , the number of serviced concurrently requests n i , and a load of zones M 1 i , . . . , M w i , . . . , M W i , i = 1, . . . , I. The measured values are passed to other modules in the web switch.
It is worth noticing that all of the information necessary to make decisions needed to queue requests and distribute them is available within the web switch and does not need to be fetched from zones. This is very convenient for maintenance reasons.

Experiments and Discussion
Research and experiments conducted up until now on web switches and brokers using presented neuro-fuzzy models showed that the devices could make high-quality decisions and cooperate well with each other in two-layer architecture. The solution proposed in this article also uses neuro-fuzzy models together with a queueing module and a new decision algorithm. The results of experiments presented in this section allow the proposed new distribution method and the design of the WCEDF web switch to be evaluated.

Testbed
The experiments were conducted in a simulation environment for web switches working in two-layer web cloud architecture. The simulation program was written in the OMNeT++ programming tool, which provides appropriate libraries and environment for conducting networking systems simulations [47].
The program was divided into the following logical modules: HTTP request generator, WCEDF web switch, FNRD web switch, web servers, and database servers. The scheme of the simulation program is presented in Figure 4. to clients. The second duty of the module is to collect information regarding requests being serviced in zones. The module measures service times ̃, the number of serviced concurrently requests , and a load of zones 1 , … , , … , , = 1, … , . The measured values are passed to other modules in the web switch.
It is worth noticing that all of the information necessary to make decisions needed to queue requests and distribute them is available within the web switch and does not need to be fetched from zones. This is very convenient for maintenance reasons.

Experiments and Discussion
Research and experiments conducted up until now on web switches and brokers using presented neuro-fuzzy models showed that the devices could make high-quality decisions and cooperate well with each other in two-layer architecture. The solution proposed in this article also uses neuro-fuzzy models together with a queueing module and a new decision algorithm. The results of experiments presented in this section allow the proposed new distribution method and the design of the WCEDF web switch to be evaluated.

Testbed
The experiments were conducted in a simulation environment for web switches working in two-layer web cloud architecture. The simulation program was written in the OMNeT++ programming tool, which provides appropriate libraries and environment for conducting networking systems simulations [47].
The program was divided into the following logical modules: HTTP request generator, WCEDF web switch, FNRD web switch, web servers, and database servers. The scheme of the simulation program is presented in Figure 4.  The request generator in the simulation program was responsible for generating HTTP requests. It contains submodules simulating the behavior of web clients, which are web browsers controlled by human users. To get the web page, the simulated web client first downloads the HTML document and then opens up to six TCP (Transmission Control Protocol) connections in order to retrieve other elements such as pictures, js, or css files. The number of web pages downloaded by clients was modeled according to inverse Gaussian distribution (µ = 3.86, λ = 9.46), whose parameters were designated on the basis of the research on real users' behavior [48]. The period of time between downloading subsequent pages was modeled according to the Pareto distribution (α = 1.4, k = 1). After finishing downloading the web page, the client's process was deleted and a new one was invoked.
Clients were simulating downloading web pages. Parameters of the web pages (size of downloaded objects and their types and number on the page) were the same as those in the very popular site https://www.sonymusic.com (accessed on 23 March 2021) [49] running on WordPress.
The WCEDF web switch module was distributing HTTP requests among zones in the simulator. There are four request distribution strategies implemented in the module. The first three strategies are used and popular in AWS [10] web switches, while the last two can be classified as intelligent:

•
Round Robin (RR)-assigns HTTP requests to subsequent zones; • Least Load (LL)-assigns incoming HTTP requests to the zone with the lowest number of serviced HTTP requests; • Path-based routing (P)-requested resources are assigned to designated zone/web server. In the experiments, a modification of the algorithm was used. It adapted to changing load and behaved like the LARD algorithm [50]. According to this, when the server is overloaded, the service of a given type of requests is moved to the least-loaded server; • Web Cloud Earliest Deadline First (WCEDF)-strategy presented in this article, • Fuzzy-Neural Request Distribution (FNRD)-the FNRD strategy is in some way similar to the WCEDF strategy. It also uses neuro-fuzzy models to estimate service time. However, the FNRD strategy does not queue the requests, and the distribution algorithm chooses a web server offering the shortest service time.
The FNRD web switch module was responsible for distributing requests inside zones among web servers. The switch used the following strategies: Round Robin, Least Load, Path-based routing, and Fuzzy-Neural Request Distribution (FNRD).
The WCEDF and FNRD web switches were modeled in the simulation as a single queue. The distribution decision service times were measured on a real server with an Intel Xeon E5-2640 v3 processor and were as follows: LL 0.0103 µs, RR 0.00625 µs, P 0.0101 µs, FNRD 0.2061 µs, and WCEDF 0.2085 µs.
The web servers in the simulation program were composed of elements causing the most significant delays in servicing requests, namely processor and SSD drive. Both of the resources were modeled as a single queue. Also, the main memory was modeled. It acted as cache memory for the file system and used the least-recently-used policy. The service times for both processor and SSD drive were obtained in experiments with a website using WordPress and running on a server with an Intel Core i7 7800X CPU, a Samsung SSD 850 EVO driver, and 32 GB RAM.
The database server was modeled as a single queue. The service times for the server were obtained for the same hardware as the web server was using.
The experiments were conducted for 12 web servers working in three zones (four web servers in each zone). This configuration is the most efficient for web cloud system with twolayer architecture controlled by cooperating FNRD web switches [43]. As was mentioned earlier, the WCEDF method uses the WCEDF web switch in the region and FNRD web switches in availability zones. In the presented experiments, the WCEDF method was compared with the web cloud-based system which uses on both layers the same distribution strategies-FNRD (configuration marked FNFN), Least Load (configuration marked LL), Round Robin (configuration marked RR), and Path-based routing (configuration marked P). Previously published results of experiments revealed that the configuration FNFN is very efficient and the obtained service times can be even two times shorter than for other popular distribution strategies used in two-layer configurations [42,43].

Experiments
To evaluate the experiments' results, different metrics were used-mean service time, 95 percentile, and 98 percentile of service time. One of the WCEDF method's main aims is to service the requests in a time shorter than the required value t max . The new proposed approach requires the application of an adequate quality factor, allowing evaluation of the most important features of the proposed method. The mean value of satisfaction was chosen. It is often used to evaluate the effect of the work of real-time soft systems.
The value of satisfaction was calculated as follows: in other cases (10) where t s max is the "soft time" after which satisfaction starts decreasing to 0 and t h max is the "hard time" after which the user will leave the page. Figure 5 presents the satisfaction in the service time function. In experiments, it was assumed that t s max = t max and t h max = 2·t max .
method was compared with the web cloud-based system which uses on both layers the same distribution strategies-FNRD (configuration marked FNFN), Least Load (configuration marked LL), Round Robin (configuration marked RR), and Path-based routing (configuration marked P). Previously published results of experiments revealed that the configuration FNFN is very efficient and the obtained service times can be even two times shorter than for other popular distribution strategies used in two-layer configurations [42,43].

Experiments
To evaluate the experiments' results, different metrics were used-mean service time, 95 percentile, and 98 percentile of service time. One of the WCEDF method's main aims is to service the requests in a time shorter than the required value . The new proposed approach requires the application of an adequate quality factor, allowing evaluation of the most important features of the proposed method. The mean value of satisfaction was chosen. It is often used to evaluate the effect of the work of real-time soft systems.
The value of satisfaction was calculated as follows: in other cases (10) where is the "soft time" after which satisfaction starts decreasing to 0 and ℎ is the "hard time" after which the user will leave the page. Figure 5 presents the satisfaction in the service time function. In experiments, it was assumed that = and ℎ = 2 • . Each experiment was conducted for different loads, measured as the number of simulated clients. The number of clients varied from 100 to 2500. Also, during a single experiment, 40 million HTTP requests were served. The warming phase was taking 10 million requests, and for 30 million, the service time was measured.
The experiments were conducted for four adopted values of : 0.5 s, 0.75 s, 1 s, and 2 s. Figure 6 presents the results of experiments for = 0.5s. Each experiment was conducted for different loads, measured as the number of simulated clients. The number of clients varied from 100 to 2500. Also, during a single experiment, 40 million HTTP requests were served. The warming phase was taking 10 million requests, and for 30 million, the service time was measured.
The experiments were conducted for four adopted values of t max : 0.5 s, 0.75 s, 1 s, and 2 s. Figure 6 presents the results of experiments for t max = 0.5 s .
As it can be seen in Figure 6, when the load of the system was low, the mean response time, the 95 percentile, and the 98 percentile were the lowest for the FNFN configuration. The FNFN configuration was very effective in minimizing HTTP request service times. When the load increased, the 95 and the 98 percentile became lower for the WCEDF strategy. This was especially noticeable in the case of the 98 percentile. The WCEDF strategy was able to keep the service time within constraints ( t i < t max ) when the load was low and keep the time as low as possible when the load increased. The FNFN configuration in heavier loads had lower mean service times. However, there are also requests with long service times, which is noticeable on the 98 percentile graph. The same phenomenon can be seen on the cumulative distribution of the service time graph (Figure 6d), picturing results for the load of 2300 clients. About 80% of requests were serviced by the FNFN cloud in a time shorter than for the WCEDF method. However, the longest 20% of service times also belonged to the FNFN configuration, which indicates that the standard deviation was greater for the FNFN than for the WCEDF. The WCEDF method, in most cases, offered a longer mean service time, but the flow of requests was much more predictable, structured and in the end, there are not requests being serviced for a longer time. As it can be seen in Figure 6, when the load of the system was low, the mean response time, the 95 percentile, and the 98 percentile were the lowest for the FNFN configuration. The FNFN configuration was very effective in minimizing HTTP request service times. When the load increased, the 95 and the 98 percentile became lower for the WCEDF strategy. This was especially noticeable in the case of the 98 percentile. The WCEDF strategy was able to keep the service time within constraints (̃< ) when the load was low and keep the time as low as possible when the load increased. The FNFN configuration in heavier loads had lower mean service times. However, there are also requests with long Results obtained for classical configurations (LL, RR, and P) were much worse and differed significantly from intelligent solutions. Figure 7 presents graphs of satisfaction for different obtained times t max . As can be seen, the satisfaction was the highest for the WCEDF method, especially when the load was heavy. When the load was low, the measured satisfaction was high and close to 1 for each of the distribution strategies and methods in each experiment. The strategies LL, RR, and P, obtained poor results for a heavier load, especially for shorter t max times equal to 0.5 s and 0.75 s. The satisfaction for the WCEDF method was close to other distribution strategies in experiments with t max = 2 s. In this case, obtained results were good because the constraints were not narrow and easy to achieve even for not-intelligent, classical strategies.  Summing up the results, the WCEDF method meets the expectations that were set during its design. The web cloud controlled by the WCEDF method was highly efficient, and it obtained high satisfaction even when the constraints were narrow and the load was heavy. The flow of requests was much more predictable and structured in the case of the WCEDF than for the FNFN configuration.

Conclusions and Directions for Future Research
The WCEDF request distribution method used in cloud-based web systems comprises two main elements-the WCEDF web switch and the FNRD web switches. The WCEDF web switch allows to the quality of service to be maintained and makes the flow of HTTP requests inside the cloud much more structured and shaped. The FNRD web switches allow requests to be distributed inside the availability zones almost in a timeoptimal way [42].
The novelty of the method is a combination of:  application of the queue of requests in the front of the web system with the method of determining the deadline service times;  distribution algorithm, designed especially for the proposed solution;  application of FNRD web switches in the second layer of the decision system. The application of neuro-fuzzy models used to estimate the service times proved to be accurate.
Promising results obtained for the WCEDF method indicate that the WCEDF switch most likely cooperates with FNRD switches, similarly to results obtained for the FNRD switches located in the region and zones [43]. If this statement is true, it will be confirmed in further publications.
The WCEDF method makes the service of the web cloud very efficient and, at the same time, more predictable than is the case with other methods providing the best-effort quality of service. Using this method, however, does not guarantee that the web system can maintain the required quality. The useful features of the WCEDF method encourage the development of a new version of the method that would guarantee the quality of service in cloud-based web systems. Summing up the results, the WCEDF method meets the expectations that were set during its design. The web cloud controlled by the WCEDF method was highly efficient, and it obtained high satisfaction even when the constraints were narrow and the load was heavy. The flow of requests was much more predictable and structured in the case of the WCEDF than for the FNFN configuration.

Conclusions and Directions for Future Research
The WCEDF request distribution method used in cloud-based web systems comprises two main elements-the WCEDF web switch and the FNRD web switches. The WCEDF web switch allows to the quality of service to be maintained and makes the flow of HTTP requests inside the cloud much more structured and shaped. The FNRD web switches allow requests to be distributed inside the availability zones almost in a time-optimal way [42].
The novelty of the method is a combination of: • application of the queue of requests in the front of the web system with the method of determining the deadline service times; • distribution algorithm, designed especially for the proposed solution; • application of FNRD web switches in the second layer of the decision system.
The application of neuro-fuzzy models used to estimate the service times proved to be accurate.
Promising results obtained for the WCEDF method indicate that the WCEDF switch most likely cooperates with FNRD switches, similarly to results obtained for the FNRD switches located in the region and zones [43]. If this statement is true, it will be confirmed in further publications.
The WCEDF method makes the service of the web cloud very efficient and, at the same time, more predictable than is the case with other methods providing the best-effort quality of service. Using this method, however, does not guarantee that the web system can maintain the required quality. The useful features of the WCEDF method encourage the development of a new version of the method that would guarantee the quality of service in cloud-based web systems.

Summary
A new HTTP request distribution method for cloud-based web systems was presented in the article. The proposed WCEDF method is applicable in two-layer architectures of cloud-computing web systems. It allows HTTP requests to be serviced in time constraints within the established quality of the web service. The main element used in the method is the WCEDF web switch. It queues incoming requests taking into account the time constraints and distributes the requests in a way that keeps selected parts of the cloud idle and ready to service time-consuming requests faster.
To evaluate the WCEDF method, a simulation environment was implemented. The simulator imitated web clients' behavior, as well as the work of a web switches and both the web and the database servers.
The results of the experiments show that the WCEDF method met the expectations. The web cloud controlled by the WCEDF method was efficient and able to service requests within time limits, even when the constraints were narrow and the load heavy. The value of satisfaction for the new method was highest in each experiment than any other method tested. The flow of requests inside the cloud was much more predictable and structured in the case of the WCEDF than for other distribution strategies. Obtained results indicate that the WCEDF web switch most likely co-operates with intelligent FNRD switches.
The research results indicate that the new solution is important and that further research is needed and development should be continued.