Game developers have used a wide range of approaches to enable online multiplayer games since they became popular in the 1990s. In this section, we survey the state of the art in the deployment of MMOGs and identify the unique attributes of these applications. Based on these, we categorize each approach according to the specified criteria so they can be easily compared. For this paper, we focus on the realization of the backend aspects of MMOGs.
In a similar fashion with infrastructure, developers and researchers use multiple types of architecture to support their MMOGs. These architectures vary in terms of organization and communication patterns but can be categorized into three main groups.
5.2.1. Client-Server Architecture
The client-server architecture is a distributed structure that partitions the workload among different resource providers (called servers) and resource requesters (called clients). Machines in a distributed system use messages in a pre-defined language to communicate. In a typical client-server system, the client can request resources from a server through these messages and the server responds by providing the resource. Client devices are not aware of one another and cannot message each other—unless the messages are explicitly forwarded by the servers.
A study by Assiotis and Tzanov [30
] discusses architectures for MMOGs [30
]. The authors propose a centralized distributed architecture (aka client-server) to support a large number of concurrent users without sacrificing efficiency or security. As described, the client-server architecture traditionally uses a single server which handles traffic to and from all clients—for example, Quake
are games using this architecture. From an architectural standpoint, this presents some challenges: Firstly, the need to support a large number of players means that a lot of data needs to be communicated through internet. Quoting the authors of Reference [30
], “[…] as the information transferred between the players and the game server is large, the bandwidth required to support a huge number of players is enormous.
” Secondly, very large worlds require “huge computational power
” to simulate, meaning that processing needs to be split to multiple computing nodes.
At the center of this study is the locality of interest. Using this concept, game developers separate large worlds into smaller regions that can be hosted on different nodes. This architecture allows both bandwidth and computational power requirements to be spread out over many nodes. However, this architecture creates new challenges:
Players are not always interested in receiving updates about certain areas of the map—especially if these areas are far away.
When two players are near the border between two parts, they still need to see and interact with each other—however this is not trivial when these are hosted on separate nodes.
Regardless of synchronization scheme, there is a possibility the game state will be invalid for events that occur near borders and affect players on both sides.
To solve the first problem, the authors introduce a concept called Area of Interest (AoI). In this concept, each player has their own AoI from which they are able to receive event updates—the area spans outwards from the player’s position for a certain distance. Players are naturally not interested in receiving updates about events they cannot see or hear because they are too far away. The size of an AoI can vary depending on the player type. For example, a player carrying a sniper rifle needs to have a larger AoI than a player with a pistol because of the range of his equipment. As a result of this concept, players are only subscribed to a limited area of the game world drastically reducing the bandwidth required to communicate the game’s state.
For the second and third problems, the authors have identified four distinct scenarios that need to be handled when players are near border areas:
A player standing near the border of two regions hosted by different servers needs to be able to receive event updates within their AoI from both servers.
A player may suddenly move to an area handled by a different server (this is usually known as “teleporting” in games).
An event originating in one server may end up in a region covered by another server. A typical example of this is shooting a rocket that travels from one area to another before exploding.
An event that occurs near the border may affect multiple regions, hosted on different servers. An example of this is a bomb exploding at a border, affecting players in adjacent regions.
By describing further solutions to these problems –for example subscribing a player to both servers when they are within a certain distance of their border– the authors provide a novel strategy with which large-scale worlds can be distributed on a client-server architecture. The results of this study show that using these solutions the authors have managed to improve the efficiency of the client-server architecture and thus the performance of MMOGs.
Nae et al. [20
] argue that “today’s MMOGs operate as client/server architectures
]. Specifically, the game server is used to simulate a world via computational and data operations by receiving and processing commands from the clients. Based on these commands, the server computes a global state of the game world which represents the positions of and interactions taking place between entities. Finally, the server sends responses containing the new state back to the client devices which render this information to the player. Nae et al. [16
] argue that a good game experience is paramount to keep players engaged and has a direct impact on the income of the game operators [16
]. For this reason, operating an efficient architecture is of huge importance. To support thousands of simultaneous players, the authors describe three parallelization techniques commonly used with client-server architectures:
Zoning: Partitions the game world into areas that are “handled independently by separate machines”. This technique is particularly useful in slow-paced games such as MMORPGs.
Replication: Parallelizes game sessions with large numbers of players gathering in certain hot-spots. Each server computes the state of a number of “active entities” that are based on it, while it synchronizes the state of other “shadow entities” that are based on different machines. This technique is primarily used in fast-paced games such as FPS games.
Instancing: “Distributes the session load by starting multiple parallel instances of highly populated zones”. These zones are independent from each other.
5.2.2. Peer-to-Peer Architecture
Another type of architecture is the peer-to-peer architecture (P2P), which partitions the workload among equipotent, equally privileged peers in a network. This architecture makes each peer a participant in the hosted application by utilizing a portion of its resources (such as processing power, storage etc.) and making it available to other peers. Each peer can thus be both a client (requester) and server (supplier) at the same time.
GauthierDickey et al. [31
] discuss the peer-to-peer architecture extensively, as their selected approach to enable a fully distributed MMOG [31
]. They argue that P2P:
Reduces delay for messages and eliminates localized congestion,
Allows players to launch their own games without a lot of investment,
Allows games to overcome bottlenecks of the server-only computation,
Is more resilient and available because it does not have a single point of failure.
In addition, the authors explain how peer-to-peer storage can work in the context of an MMOG. When utilizing peer-to-peer storage, the consistency of data must be guaranteed using mutual exclusion. The two sub-types of P2P architectures described are the unstructured
and structured networks
. In unstructured P2P, clients can transfer files to each other directly while in structured P2P a distributed hash table is responsible for converting resource names into addresses in the network. This requires the maintenance of routing tables by each peer, using special algorithms. The authors of Reference [31
] also explain how P2P computation occurs using completely distributed scheduling. Unlike the client-server architecture, peer-to-peer utilizes the down-time of peers to provide computational power to the peers that need it. The authors, however, admit that cheating is an issue with P2P systems that support MMOGs. The nature of this fully distributed approach makes them vulnerable to state manipulation—something that must be addressed to successfully utilize P2P as an MMOG architecture.
Kavalionak et al. [32
] state that the client-server architecture is the most used option when it comes to MMOG architecture [32
]. However, they assert that this approach has limitations which cause it to have scalability limits.
On the other hand, the peer-to-peer approach can offer the following advantages:
Inherent scalability, as the available resources grow with the number of users,
Robustness as systems using this architecture can self-repair when a peer fails,
Avoids bottlenecks as network traffic is distributed among the users.
Others, like Mildner et al. [33
] focus on the performance aspect of architectures for MMOGs. The authors propose a P2P overlay for a Networked Virtual Environment (NVE) for an MMOFPS game. Their approach tries to minimize the overhead for connection management to create a highly responsive system. Instead of using sender-oriented message distribution, which is used by most existing systems, the authors utilize a publish-subscribe mechanism
to avoid overlay inconsistencies and map user interest within an NVE more efficiently. Moreover, they propose a Geocast algorithm which sends messages to an arbitrary set of users based on their positions. They implement an NVE system on both a simulation environment and the pre-existing game PlanetΠ4. The results of their experiments have shown that their approach offers a scalable and consistent overlay by limiting the number of connections per user, which is crucial especially in scenarios of crowding/flocking.
5.2.3. Hybrid Architecture
Kavalionak et al. [32
] also propose a novel cloud based architecture with hybrid architecture consisting of two components: the positional action manager
which manages positions of entities and the state action manager
which enables the storage of entity states without the need to transfer them across nodes [32
The authors claim that this is the “first [work] proposing the integration of P2P and cloud computing
”. Furthermore, they explain that hybrid MMOG architectures aim to “exploit and combine
” the advantages of both P2P and client-server architectures. An issue with this type of architecture is the strategy for the partition of the virtual environment. Using the first method—spatial partitioning
—the game world is divided into regions which are then distributed to the peers with the most resourceful peer entering that area becoming the manager of it. For example, the authors of Reference [18
] propose a hybrid system which includes a central server and a pool
of peers. In this system, the central server hosts the MMOG and distributes the game to other peers once it reaches its full resource capacity. The second method—functional partitioning
—delegates important functions to the peers.
Jardine and Zappala [34
] utilize the functional partitioning approach and categorize types of moves within an MMOG into two groups: Positional moves
which occur when a player moves in the game environment and State-changing moves
which change the game state (such as when a player attacks another player) [34
By distinguishing between these two types of events, functional partitioning can be used to delegate only a subset of the total events using the P2P approach. The authors explain that positional moves are comprised of abstract data (such as the position of a player) and do not contain any entity specific information—thus making them easier to distribute among non-reliable peers. On the other hand, state-changing moves contain entity-specific data. In this hybrid system, a central server appoints peers as regional servers which handle positional moves for a specified region of the world. Because they contain only abstract data, the consistency of the game will remain intact even if a peer (which is considered non-reliable) abruptly leaves the game session. Conversely, state-changing moves—which contain the state-specific data and would compromise the game state consistency if lost—are assigned to the central server which is more reliable than a regular peer.
More recent advancements in research, such as those proposed by Matsumoto and Okabe [35
] further analyze the features of MMOGs and investigate cheats that may be caused by the P2P type of architectures. After considering the types of cheats possible as well as the detectability of each cheat type, the authors propose a collusion-resilient hybrid P2P framework for MMOGs which utilizes data ‘scrambling’ and the Chinese Remainder Theorem to guard against data corruption. The authors evaluated the proposed framework and compared it with other approaches, concluding that it was more effective –especially in cases where various peers colluded with each other– even though it did not completely eliminate the possibility of cheating.
Zhang et al. [36
] identify the challenges of Virtual Reality MMOGs (VR-MMOGs): stringent latency, high bandwidth and large scale. They propose a hybrid gaming architecture to achieve more efficient distribution of work by placing local view updates on edge clouds for faster responses, higher bandwidth and global state updates on center cloud for higher scalability. To achieve this, they use a service placement algorithm which dynamically places a user’s service on edge clouds while they move across different access points. To evaluate their approach, the authors conduct simulation experiments using their approach compared to other gaming approaches. They find that their approach is a “viable solution for supporting VR-MMOGs
Similarly, Plumb et al. [37
] explore the benefits of adapting hybrid architectures for use with edge servers. They propose AvatarFog, a solution for forming hybrid P2P clusters of nodes using game design to decide the network topology instead of the physical structure of the client-server architecture. They focus on improving latency between the players and look at the interactions between them in the virtual world instead of the physical connections of clients to servers. Their approach groups players together using gameplay as a common factor rather than their position in the world. By creating a custom simulator, they are able to evaluate the performance of AvatarFog and conclude that their approach “improves latency and server resources over the traditional server and client model
A large variety of storage systems have been utilized in support of MMOGs. In this section we discuss how different studies have used varying types of data persistence systems, depending on their needs, to enable MMOGs on both dedicated and cloud infrastructures.
For instance, Spanner is “Google’s highly available global SQL database
”, which manages data replication and transactions at a large scale [46
]. Brewer [46
] argues that based on the CAP theorem, systems can only have two of the three properties: Consistency
and Partition tolerance
. In other words, databases which need to be distributed across many nodes—and thus be scalable—cannot be fully consistent and available at the same time. According to Reference [47
], either of the two needs to be sacrificed: “Relaxing consistency [allows] the system to remain highly available whereas making consistency a priority means that the system will not be [fully] available
”. To work around the CAP theorem, Vogels [47
] suggests the use of eventual consistency, a form of weak consistency [47
]. Eventual consistency guarantees that given no new updates to an object, all accesses will return the last updated value. Depending on system load, latency, and so forth, there is a specific inconsistency window during which consistency failures may occur. The author argues that this inconsistency has to be tolerated because (i) it results in an improvement in performance under highly concurrent conditions and (ii) can handle partitioning of the data which would otherwise render the system unusable.
MMOGs have several differences in requirements when compared to other types of software [38
]: (i) Latency is more important than throughput in order to support fast response times for users; (ii) Unlike most other applications, MMOGs require a higher ratio of data writes to reads; (iii) Users are more willing to tolerate the loss of data due to failure as long as the recovered state remains consistent, in contrast to other applications that involve real-world goods or payments.
The authors discuss data storage in Project Darkstar: “an infrastructure for building online game worlds”. They propose a new approach which uses write caching to cache data locally on each node if the data is only being used by that node. If any other nodes require access to the modified data, then the node flushes the modified data to the central server so that it can be accessed. The suggested approach can lead to lower network latency, especially since most of the data changes are utilized on the same node. As a bonus, it allows the addition or removal of nodes and avoids the need for redundancy and backups because nodes do not store “globally important data”. This allows this storage option to be scalable while maintaining consistency.
In their survey of big data and cloud computing, Agrawal et al. [48
] outline the features which a cloud system must possess to effectively utilize cloud economics
: (i) scalability, (ii) elasticity, (iii) fault tolerance, (iv) self-manageability and (v) ability to run on commodity hardware [48
]. The authors argue that traditional RDBMSs are not optimized for use on the cloud as they were created for use on enterprise infrastructure and say that “the hefty cost
” associated with them is less attractive for deployment of large scale applications on the cloud. Instead, they focus on a newer generation of distributed data stores which utilize key-value pairs. This type of data storage—known as NoSQL—has been very successful and widely adopted, mainly because of its ability to scale on cloud systems. Nevertheless, the authors argue that the key-value type of data stores lack in functionality when compared to the traditional RDBMSs, limiting the set of applications that can be created with them.
Researchers at Google have designed appropriate storage systems such as BigTable
] and Megastore
] to meet the demands of their online services. They identify several conflicting requirements of modern applications—such as MMOGs—that are based on the Internet:
The applications must be highly scalable to accommodate a potentially large audience of users,
Rapid development of features and fast time-to-market is essential for competitiveness,
Services must be responsive, therefore a system must have low latency,
The system should provide a consistent view of data—therefore updates need to be made visible immediately,
Services must be highly available and resilient to multiple types of failure.
While relational databases provide a rich set of features, the authors agree that they are difficult to scale. On the other hand, NoSQL datastores such as Bigtable [49
] and Cassandra [51
] are highly scalable but have limited APIs, loose consistency and fewer features—which complicates application development. Megastore [50
] falls in the middle of these two types of data storage, “[blending] the scalability of NoSQL datastores with the convenience and functionality of a traditional RDBMS
”. It provides fully serializable ACID semantics over distant replicas of data which leads to low latency. Additionally, by using synchronous replication, Megastore is able to achieve both high availability and strong consistency at the same time. The Megastore relies on both RDBMSs’ schema to define its data model, while featuring the row-column model structure of NoSQL. Data is entered as entities which contain a set of properties, which are essentially key-value pairs of strings, numbers and so forth.
Google’s Datastore [3
] is one of the latest incarnations of Megastore. The Datastore is a similar, highly available, highly scalable distributed data storage that can be utilized by developers using the Google Cloud Platform. Applications developed to run on Google’s App Engine utilize the Datastore to create web applications while encompassing the advantages of both RDBMS and NoSQL systems. In addition, the Google Query Language (GQL) can also be used as an efficient way to run queries on the data, similar to RDBMS. GQL is very similar in syntax to SQL, even though it has some limitations. One of these limitations is the lack of complex queries such as join queries.
More recent research by Diao et al. [51
] shows that strong consistency can be provided for scenarios where systems need to be both highly available and scalable, by implementing a lightweight mechanism which detects failures and reacts accordingly when needed. Firstly, they classify MMOG data into four sets: (i) account data, (ii) game data, (iii) state data and (iv) log data. Secondly, they describe an approach that processes modifications of state data by using an in-memory database in real-time. These changes can be synchronously propagated to other players in an acceptable amount of time. Then the data is backed up to the disk database periodically to allow recovery to a previous state. Furthermore, the authors say that popular MMORPGs such as World of Warcraft and Second Life utilize Relational Database Management Systems (RDBMS)—predominantly MySQL and Microsoft SQL Server. They argue that RDBMSes do not fully satisfy the requirements of MMOGs and propose the use of Cassandra, a cloud data management system. Cassandra is able to support bulk writes and rare read operations which is the predominant scenario for MMOGs. Because of Cassandra’s weak support for strong consistency, the authors also propose a solution based on eventual consistency to achieve strongly consistent data storage. Furthering their research in [52
], the authors investigate how to benefit from the advantages of cloud data management solutions while addressing their shortcomings with regard to MMOGs. After their analysis of typical architecture and data management requirements for MMOGs and their categorization of MMOG into four groups, the authors propose using multiple data management systems in a single MMOG to manage diverse data sets accordingly. Data which requires strong consistency and security (e.g., account data) is managed by an RDBMS, while data requiring scalability and performance (e.g., logs and state data) are stored in a cloud storage system. The authors implemented a simulation environment where many clients can interact with many servers and a game prototype based on an open-source MMOG to evaluate their approach. They found that the guarantee of high-level consistency in Cassandra is not efficient, thus proposing a timestamp-based model that solves this problem.
The performance of MMOGs and other resource-intensive applications is considered a critical factor for their success [16
]. This section surveys techniques used by researchers to measure the performance
of their solutions as well as approaches they used to improve the performance
Often, the performance of fast-paced games is measured in frames-per-second as well as in terms of latency. Unsurprisingly, the latency expectations vary based on the game type.
For instance, these are some common game genres with their expected/desired latency, as it was found in the related work:
First Person Shooter
(FPS): 100 ms –250 ms [20
(RTS): 500 ms–1000 ms [31
(RPG): 1 s–2 s [20
Dhib et al. [21
] argue that “ensuring an acceptable Quality of Experience (QoE) for all players is a fundamental requirement [for cloud-based games]
]. The authors propose a mathematical model for measuring the QoE in MMOGs. They identify the global response delay
as the most notable metric, which is dependent on several other parameters such as the CPU and memory capacity. Furthermore, they support that the network distance between the user and the server plays a significant role in the response delay. They evaluate: (i) the performance of a cloud-based MMOG in terms of response delay, using simulations and (ii) the degradation of the QoE as a function of the number of allocated VMs and number of players, using an empirical approach. Furthermore, they propose a mathematical model which expresses the QoE as a function of the network and processing delays. Using this model, the authors propose a dynamic VM allocation strategy which attempts to minimize the cost per customer, while ensuring that a “minimal threshold” of QoE is maintained. Data gathered shows that the approach resulted in “high player satisfaction” while maintaining 99% of QoE.
Lin and Shen [54
] propose a lightweight system called CloudFog, which aims to improve the performance of cloud-based games by incorporating “supernodes”—powerful nodes that are located between the end-users and the cloud [54
]. CloudFog works by computing the game state on the cloud but then uses the supernodes to carry out several intensive tasks such as video rendering and streaming to client nodes. The authors identify challenges that hinder their games’ success, such as: latency
(response delay), network connection
, user coverage
and bandwidth cost
. They utilize simulations to evaluate the performance of CloudFog
and assess it against other systems. They measure (i) latency, (ii) playback continuity and (iii) user coverage. With these, the authors examined the effectiveness of the supernodes approach and concluded that CloudFog reduces latency, bandwidth consumption and bandwidth cost, while it has a positive impact on user coverage.
Successful deployment of any MMOG requires an “ultralow-delay cost-efficient design
]. Thus, Barri et al. [18
] investigate how the costs can be minimized while satisfying the QoE requirement for players. They present a resource allocation framework for cloud infrastructure that benefits from virtualized resources. They consider all resources that can impact delay after presenting an ‘accurate delay model’ to control the QoE. They further propose the use of a new delay-aware cost-minimization resource allocation scheme and perform simulations to evaluate its performance. Their MATLAB simulations are conducted on cloud servers that are randomly distributed across a large geographic area and the workload model of the simulations is obtained from real online game data. Results obtained from these simulations show that the new scheme outperforms others in terms of cost efficiency while still satisfying the delay requirement which is paramount to a satisfactory QoE.
Gascon-Samson et al. [55
] state that in order to maintain the quality of experience, the game state update message “must be delivered within specific time bounds
”, depending on the type of MMOG. They describe flocking—the gathering of players in hotspots—as a challenge, because of the high bandwidth requirements it places on a single server. To provide more efficient state updates the authors present DynFilter, a game-oriented message processing middleware which filters out state update messages from entities located far away in order to reduce bandwidth needs. DynFilter is based on the publisher-subscriber pattern instead of sender-oriented message models. By running experiments on Amazon’s EC2 platform, they prove that DynFilter is able to maintain bandwidth use within quotas while still maintaining the QoE through the delivery of state update messages.
Similarly, Yusen et al. [56
] propose a fairness-aware state update scheduling algorithm that minimizes inconsistencies while guaranteeing fairness in Multi-Server Distributed Virtual Environments (MSDVEs). They argue that MSDVEs suffer from saturation which leads to high bandwidth use and huge resource demand. To achieve timely state dissemination and guarantee fairness, they first devise a new metric which uses time-space inconsistency to measure unfairness in an MSDVE. Secondly, they proposed a fairness-aware update scheme that ensures that updates are issued to different clients at the same time. Lastly, their algorithm called FairLMH minimizes inconsistency in MSDVEs. By conducting simulations, the authors were able to prove that FairLMH providers better fairness compared to other similar algorithms in multiple scenarios.
Assiotis and Tzanov [30
] discuss fault tolerance in a client-server architecture for MMOGs [30
]. Looking at performance from another angle, they state that a “system should recover the entire state of the world it represents as it was prior to the crash very quickly and as transparently as possible
”. This suggests that another measure of a system’s performance could be (i) the frequency (or rather infrequency) of errors, (ii) its ability to transparently recover to a valid state and (iii) the time taken for the recovery to take place. While the authors say that their client-server architecture is not inherently fault-tolerant, they propose mirroring and replication to prevent any players from being “locked out” of the game due to server crashes.
Jardine and Zappala [34
] propose and evaluate a hybrid architecture using a simple MMOG and automated players (bots) which are programmed to move toward game objectives as quickly as possible [34
]. The authors created the game in a way that it can be played on both client-server architectures and hybrid architectures. Assuming stable Internet connections for all players, they run a series of experiments each consisting of fifty players with a player joining that game at an average of one second. They conduct their experiment on client-server architectures and subsequently on hybrid architectures and measure incoming and outgoing bandwidth and latency. These experiments have revealed that the hybrid architecture can “save considerable bandwidth for the central server
” and that “latency can be kept low
” as long as there are enough peers capable of acting as regional servers.
Nae et al. [16
] evaluate their proposed MMOG ecosystem by utilizing trace-based simulation [16
]. The authors propose an analytical model that can be used to express the overheads of virtualization from cloud resources. They evaluate the effect of utilizing virtualized resources on the quality of gameplay using RuneScape, a popular MMOG and state that virtualized resources can “negatively affect the MMOG session at high load volumes
” (higher than 90%), which is what they initially expected. These authors demonstrate the usefulness of their ecosystem by showing that resource under-allocation grows only linearly with the VM size and start time while the bandwidth and virtualization penalty have little impact on performance. In a relevant study [28
], the same authors also assessed the impact of data center policies on the quality of resource provisioning. They found that “[dynamic resource provisioning] can be much more efficient than its static alternative even when the data centers are busy
”. To support this, they present experimental results from trace-based simulations which highlight the real-time parallelization and load balancing of a game prototype using external data center resources—ultimately illustrating the advantage of dynamic resource provisioning.
Furthermore, El Rhalibi and Al-Jumeily [57
] propose a dynamic AoI management that aims to minimize delay and network traffic for MMOGs based on a hybrid architecture [57
]. They use simulations with 125, 500 and 1000 peers respectively to carry out their evaluation, with scenarios comprising of both client-server and hybrid architectures. The results of their study show that their AoI management approach produces lower delay and network traffic when used on their hybrid architecture compared to the client-server architecture without AoI management.
Negrão et al. [29
] evaluate their hybrid cloud solution using a setup comprising of nine machines [29
]. They compare the performance of their system when all servers are state partitioned against a version with mixed task servers and state partitioned servers. They obtain results from a varying number of clients simulated by bots, showing the difference in performance between the two approaches—which underscore the improvement in performance in terms of higher frame rates and lower bandwidth.
], on the other hand, is evaluated using “synthetic microbenchmarks [which allow the authors to] vary the number and size of independent [event] contexts
”, effectively allowing for a greater range of experimental setups to be tested. The authors use two case studies to test their approach, a key-value store and a game server. Their experiments utilize a generator which produces a series of events to measure the system’s performance in terms of throughput. Using microbenchmarks, the authors discovered that “as the number of physical nodes increases, throughput does not drop; in fact it increases
”. Drawing conclusions from this, the authors claim that (i) the maximum throughput of EventWave applications does not drop as logical nodes spread over physical nodes and (ii) EventWave is able to harness the computational resources of multiple physical resources to maintain performance under high levels of load. To test EventWave as a game server, the authors deployed 128 clients over 16 Amazon EC2 instances. They generated “artificial load” in the game world to simulate game behavior. To evaluate this case study, the authors measure the average latency of clients as they move around the game world. When their proposed elasticity mechanism is not activated, the server is unable to process client requests fast enough, resulting in a dramatic increase in latency. However, when EventWave is activated, the server scales up to take all available physical nodes. In the opposite direction, the system scales down when the number of players is reduced. This dynamic migration (i) allows the provision of resources to serve the game more rapidly, (ii) reduces the average latency experienced by the clients and (iii) allows the system to scale up or down according to demand.
Baker et al. [50
] argue that the development of Megastore was “aided by strong emphasis on testability
]. To detect bugs in their system, the authors used a pseudo-random framework—“the network simulator
”. This simulator is capable of “exploring the space of all possible orderings and delays of communications between simulated nodes or threads
”. Given the same seed, the simulator is also capable of reproducing the same behavior. Using this tool, the authors were able to detect bugs when a problematic sequence of events triggered an assertion failure in their code. They claim that even though an exhaustive search of all the possible states is impossible, the simulator explores “more than is practical by other means
”. Using real-world deployments, the authors also observed that Megastore provides a latency tail “significantly shorter than that of the underlying layers
”. Furthermore, applications based on Megastore can withstand planned or unplanned outages with “little or no manual intervention
Security is an issue that is sometimes overlooked when it comes to developing games. Admittedly, it may not be a critical aspect of games or is at least not a priority when it comes to their development. However, modern MMOGs feature much more than a game world and simulated entities. Today’s games feature micro-transactions and sometimes store sensitive personal data, both of which require developers to take a serious stance on security to avoid financial loss and/or legal problems in the future. In this section, we explore the security concerns that have been expressed by relevant sources to provide insight into the relevance of security for MMOGs.
Shaikh et al. [15
] discuss security in the context of their proposed on-demand platform for MMOGs [15
]. They argue that by provisioning resources on a server-by-server basis, their system simplifies many issues arising from the “fine-grained resource sharing
” approach they use. The authors argue that while it may be desirable to host games with low resource requirements on the same server, this would require “sufficient protection
” so that no game is allowed to corrupt another game’s data. Furthermore, the authors discuss issues related to the peer-to-peer distribution model. They argue that while this model is appealing because of lower bandwidth costs, it adds a liability for game providers who use it to distribute content: “Players must allow […] untrusted machines to connect to their own machines
”, which exposes them to malicious actions.
The authors of References [31
] also support that the peer-to-peer architecture may lead to security problems. Kavalionak et al. [32
] state that the lack of a central authority “hinders security and anti-cheating enforcement
” and argue that when clients have “heterogeneous constraints on computational, storage and communication capabilities
”, they become vulnerable to exploitation. Furthermore, GauthierDickey et al. [31
] argue that cheating is a problem that “plagues modern games
”. The author agrees that the main problem of the P2P approach is data manipulation and identifies the types of cheats that can be employed by malicious users:
Fixed-delay cheat: where a fixed amount of delay is added into each packet.
Timestamp cheat: where timestamps are changed to alter when events occur.
Suppressed update cheat: where updates are purposedly not sent to other players.
Inconsistency cheat: where different updates are sent to different players.
GauthierDickey et al. [31
] provide several solutions to the cheating problem. One of these solutions is “Lockstep”, which secures against cheats by dividing game time into rounds during which players send a cryptographic hash of their move to other players. While this approach secures a game against cheating, it presents a drawback of unacceptably high “playout latency”—beating the purpose of the means. Other methods, such as Asynchronous Synchronization [58
] and the Sliding pipeline protocol [59
] used to resolve this problem are also subject to problems such as high latency and incomplete protection against all possible types of cheats.
Jardine and Zappala [34
] propose a P2P/client-server hybrid architecture in which the critical processing events occur on a central server, while non-critical positional updates occur using the P2P approach [34
]. These authors claim that the ability to cheat “is significantly limited
” in their hybrid architecture because the central server “controls all access to game state
”. They discuss that one possible attack is when a regional server (P2P based) drops or delays some of the state updates (Suppressed update cheat). Their solution copes with this issue by having clients monitor regional server updates for latency and loss and then reporting these issues to the central server which may replace them if enough players complain. In addition, the central server requires each regional server to send a positional update periodically. If three consecutive updates are missed, then the regional server is considered to have failed. This mechanism provides “additional protection against poor performance or failure
”. The authors also identify another possible attack when “a player acting as a regional server [joins] its own region
”. If this is allowed, the player may be able to see how other players move before the move is made. The authors remove this possibility by making the central server replace a regional server that moves into its own region. Furthermore, the regional server “may attempt to collude with other players in the region
”. To tackle this problem the authors implement an auditing mechanism that checks if each state-changing move made was legitimate by using logs to verify that a player had enough time to move into the area where the action occurred. Finally, players may also “receive an unfair advantage by joining many regions at the same time
”. To eliminate this problem, the central server controls regional server assignments. Whenever a player moves between two regions, they have to contact the central server, get the identity of the new regional server and allow the central server to update the membership list for the affected regions.
5.7. Other Trends in Cloud-Based Games
Besides the previous points which were discussed based on the identified criteria, a few more approaches exist which are better described as a separate trend. These approaches are based primarily on server-side processing and light client-side rendering. While the main difference between these games concerns their architecture, there are further significant differentiations such as their monetary model.
For instance, a study by Lin and Shen [60
] argues that building, deploying and maintaining large data centers is cost-prohibitive [60
]. These authors propose an alternative, lightweight system called CloudFog
. The fog
concept uses powerful super-nodes that act as intermediaries between cloud servers and client machines. Using this approach, the intensive computation of the game state occurs in the cloud. This is also known as Cloud Gaming
]. Updates are sent by the cloud servers to super-nodes, which update the virtual world, render the game graphics and stream it as video to the players. An advantage of this solution is that users can play resource-intensive games without expensive hardware. In addition, latency is reduced because the nearby super-nodes are used to render and transmit video which would otherwise have to be transmitted by far-away cloud servers. Thirdly, it helps with the reduction of bandwidth costs as the download of video is done from nodes on the edge rather than at the core of the cloud. On the downside, this approach requires high-speed, stable Internet connections which is not guaranteed in all contexts such as in mobile games.
The use of edge computing is not unique to cloud gaming
. A study by Burger et al. [61
] argues that the performance of MMOGs in terms of latency depends on the geographic distribution of players. In order to minimize latency, the game servers should move closer to the players and towards the edge of the network. To prove this, the authors analyze match histories and statistics from Steam
, a popular gaming platform. Using this data, they develop a model which can predict player location and match duration. They use their models to evaluate the migration of MMOG Dota 2 matches toward the edge using an event-based simulation framework. Their emphasis lies on how the server placement impacts the QoE of games. The results of the study show that deploying edge servers in many cases reduces the distance of a player to a server by half, thus reducing the load on the dedicated server. Ultimately, a higher number of edge servers with smaller capacities appears to be more beneficial compared to a more powerful dedicated server despite the added operational overload.
More recently, a study by Plumb and Stutsman [62
] argues that “Google’s Edge Network changes everything we have concluded about peer-to-peer networks over the past decade
”. As the authors describe, Google’s Edge Network allows the inclusion of trusted peers in untrusted node clusters, allowing developers to explore P2P algorithms while maintaining the security of their data. The authors investigate the possible advantages of this approach for game developers and MMOGs. They gather ping data and map the population counts of several areas in the United States, which helps run out a simulation to compare existing solutions such as a ‘traditional topology’ and an ‘edge topology’ with their own ‘Optimized edge network’. Their results indicate that this optimized solution ‘presents potential’ both in terms of performance (latency) and maintaining security in a P2P network.
An extreme approach for thin-client computing, named Stadia
, is developed by Google [63
]. This is a cloud gaming service, built to stream games at high resolutions and frame rates. It requires an Internet connection but no gaming hardware (such as high-end GPUs and RAM) on the client-side. Stadia renders game graphics on the cloud-based hardware and uses YouTube-like functionality for streaming media to the user. As a cloud-based solution, Stadia offers “tremendous scale
”and provides a limited variety of hardware and software stacks for developers to use. While it does support multi-player games, it is not designed to realize MMOGs, even though its architecture appears to be fertile for that kind of games too. On the other hand, Arcade by Apple [64
] is an online store which offers games that operate offline—which excludes MMOGs [64
]. The games execute on the player’s device but use the cloud to sync their state so the players can seamlessly switch devices.