Next Article in Journal
The Role of Sociodemographic Variables in Game Type, Hardware Preference, Awareness and Level of Involvement
Previous Article in Journal
Iterative Optimization of Structural Entropy for Enhanced Network Fragmentation Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mathematical Analysis of Page Fault Minimization for Virtual Memory Systems Using Working Set Strategy

by
Aslanbek Murzakhmetov
1,2,*,
Gaukhar Borankulova
1,
Arseniy Bapanov
1 and
Gabit Altybayev
3,*
1
Department of Information Systems, Faculty of Technology, M.Kh. Dulaty Taraz University, Taraz 080001, Kazakhstan
2
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA
3
Department of Radio Engineering, Electronics and Telecommunications, International Information Technologies University, Almaty 050040, Kazakhstan
*
Authors to whom correspondence should be addressed.
Information 2025, 16(10), 829; https://doi.org/10.3390/info16100829
Submission received: 25 July 2025 / Revised: 12 September 2025 / Accepted: 21 September 2025 / Published: 24 September 2025

Abstract

Poor code locality in virtual memory systems significantly contributes to page faults, leading to degraded system performance. Although many solutions aim to minimize page faults, most rely on clustering techniques that do not quantify the approximation error relative to the optimal solution. In this work, we develop a novel mathematical model based on the Working Set strategy combined with a geometric interpretation of the computational process via a Hasse diagram. This approach enables the reduction of the problem dimensionality and facilitates identification of critical control states under realistic constraints. We formalize the minimization of expected page faults as a discrete optimization problem with well-defined functionals and constraints. Experimental evaluation demonstrates that our model achieves lower average page faults and execution times compared to classical algorithms, especially under poor code locality conditions. Our method also provides a foundation for obtaining ε-optimal solutions and paves the way for designing efficient and cost-effective page replacement algorithms with provable guarantees. These contributions establish both theoretical and practical advances in virtual memory management.

1. Introduction

Virtual memory is a technique in memory management that allows a computer to use more memory than is physically available by temporarily transferring data from Random Access Memory (RAM) to disk storage. This is essential for running large programs or multiple applications simultaneously without exhausting physical memory [1]. Virtual memory allows processes to use more memory than is physically available by swapping data between physical memory and secondary storage. This is a key mechanism in systems with virtual memory, where memory is divided into pages and the operating system dynamically manages their placement between RAM and disk. However, this can lead to page faults. It happens when a process tries to access a virtual memory address for which there is currently no valid mapping in physical RAM, and this is called “minor fault”. Page faults can also be the result of poor code locality. Poor code locality directly affects the frequency of page faults because it determines how often and in what order the program accesses memory pages [2,3,4]. Poor code locality causes the program to switch between pages frequently and can lead to thrashing (the system spends more time paging than executing code). The problem is how to relocate blocks (or program segments) across pages of virtual memory to minimize page faults [5,6]. Program code transformations, such as program restructuring [7,8,9] and refactoring [10,11,12,13,14], as well as various forms of code reorganization, have a positive impact on page faults, particularly in terms of locality. The absence of a definitive solution to this problem has sustained ongoing interest and research in both past and present studies.
Many formulations of page faults or program optimization problems lead to complex combinatorial challenges, making it necessary in practice to rely on approximate or heuristic approaches. Most existing research in this area is based on clustering techniques [15,16]. While these techniques have shown improvements in experimental settings, they provide only approximate solutions with unknown accuracy, i.e., the cluster approach does not estimate how the solutions are obtained far or close to unknown exact (optimal) solutions [17,18]. The Working Set (WS) strategy, proposed by P. Denning [19], aims to prevent thrashing (excessive swapping that slows down the system) by ensuring that the pages and process needs are resident in memory. This strategy is particularly relevant for optimizing performance in systems with limited RAM, as it balances the degree of multiprogramming (running multiple processes) and CPU utilization. In addition to the experimental comparison with classical algorithms such as the Working Set, Least Recently Used (LRU), and First In First Out (FIFO), this study also examines recent research efforts aimed at optimizing virtual memory management and reducing page fault rates. In [20], the proposed page replacement policy monitors the current working set size and controls the deferring level of dirty pages, preventing excessive preservation that could lead to increased page faults, thus optimizing performance while minimizing write traffic to PCM. In [21], authors modified the ballooning mechanism to enable memory allocation at huge page granularity. Next, they developed and implemented a huge page working set estimation mechanism capable of precisely assessing a virtual machine’s memory requirements in huge page-based environments. Integrating these two mechanisms, they employed a dynamic programming algorithm to attain dynamic memory balancing. Furthermore, other studies [22,23,24] have focused on workload-aware estimation of working set sizes in virtualized systems, presenting adaptive memory allocation strategies that mitigate page faults by accounting for workload-specific patterns. Such techniques can complement our proposed model, which relies on a formal discrete approach combined with averaging across multiple program executions, thereby extending its applicability to complex and dynamic execution environments. However, unlike existing approaches, our method introduces a geometric interpretation of the computational process through Hasse diagrams, which not only reduces the dimensionality of the optimization problem but also provides an algorithmic foundation for systematically identifying critical memory states. This problem is classified as NP-hard, meaning that finding the optimal solution is computationally hard for large instances, as it would require checking an exponentially large number of possibilities. Thus, the research also has a fundamental aspect [25,26,27,28] that has encouraged us in our research efforts.
In this paper, we focus on the problem of page faults minimization for virtual memory systems. Motivated by the need to achieve either an optimal or near-optimal solution, our goal is to construct an approach based on identifying functional and corresponding constraints using the working set swapping strategy to minimize page faults. The novelty of this work lies in introducing a mathematical approach that integrates the Working Set strategy with a geometric interpretation of the computational process through Hasse diagrams. A key innovative element is the use of the Hasse diagram. The diagram is not merely a visualization tool but becomes an algorithmic foundation for reducing the dimensionality of the problem of page faults minimization. This approach enables a systematic identification of critical memory states and formalizes the optimization of program code block allocation across memory pages under strict functional and combinatorial constraints, paving the way for deriving optimal or ε-optimal solutions. Furthermore, the study establishes a foundation for the development of adaptive and intelligent page replacement algorithms that go beyond traditional heuristic-based approaches. Experimental results demonstrate the superiority of the proposed method over classical page replacement algorithms, emphasizing its significance for advancing next-generation memory management mechanisms.

2. Methods

In computing systems with page-based organization of the virtual memory, programs generate a sequence of references (accesses) to their pages during execution, which we will call “control state”. At any moment of program execution, the physical memory (RAM) does not contain all pages of the program, but only a part of them (the resident set). Figure 1 shows an example of virtual and physical memory. Virtual memory contains blocks of different sizes divided into pages. The physical memory contains copies of the virtual memory pages, and here the blocks are restructured. The size of the physical memory at any moment of the computational process is much smaller than the size of the virtual memory. Let the program code with poor locality that requires segmentation consist of n blocks with numbers i 1 , i 2 , , i n which are singled out in advance and scattered over p pages S 1 , S 2 , , S p of a virtual memory.
The code execution causes a problem because of generating a redundant number of page faults, which can be greatly affected by the reason of poorly structured program code and that reduces performances both of the program code and the system itself. Let v r , be a length of r-th page, r = 1 , 2 , , p and l i be a length of block   i ,   i = 1 , 2 , , n . Thus, the system supports the multidimensional size of pages. As blocks, we mean a part of the code such as subroutines, linear segments of a code, separate interacting programs, data blocks, etc.. Distribution of blocks i 1 , i 2 , , i n over pages S 1 , S 2 , , S p is assigned, for example, via Boolean matrix x = x r i p × n as shown in Figure 2, where an element x r i = 1 , if the block with number   i belongs to the page with number r and x r i = 0 , otherwise. All such kind of matrices we denote via X .
In our case, working set R q , x is generated by control state   q and matrix x . As the corresponding denotation for the control state, we will use q . The control state q t of the program at moment t , is a sequence of program references to their pages for the last k moments before moment t . Figure 1 indicates that control state q is q = i 1 , i 2 , , i m q where i j is block number j = 1 , 2 , , m q , which belongs to q and any of them marked as 〇. Another one symbol in Figure 1 means blocks (or its numbers) which does not belong to q but belongs to the corresponding page of working set R q , x and is present in the physical memory. In other words, all elements 〇 are blocks that are often referenced and they form the working set; other elements are also blocks that do not form the working set, but they can be present in physical memory at any moment of the computational process. For the matrix x X , there are constraints (a)–(c) [29], which are described below:
Functional: As a functional of the main problem, we will take a mathematical expectation of the number of page faults for one run of the program code. As a functional of the auxiliary problem, we will take the mean value of page faults for h 1 runs of the program code.
Constraint (a): The total length of the blocks belonging to any page does not exceed the length of this page.
Constraint (b): Any block of the program code belongs to only one page of the program code.
Constraint (c): Total length of any working set generated while the execution of the program code does not exceed some system constant that is known in advance.
Constraints (a)–(c) have to be assigned by a matrix x = x r i p × n , which defines distribution of the blocks i 1 , i 2 , , i n over pages S 1 , S 2 , , S p . An important role for our consideration plays a Boolean matrix x = x r i p × n which determines the structure of a program, i.e., distribution blocks b 1 , b 2 , , b n of a program over pages S 1 , S 2 , , S p . For x , it has to hold constraints (a)–(c), and all such kind of matrices form the set X . Next, we present an example of the structure of matrix x = x r i p × n with control state q = i 1 , i 2 , , i m q t singled out among columns of matrix x = x r i p × n . The matrix x = x r i p × n helps to calculate the function δ q i x .

2.1. Geometric Interpretation of the Computational Process

We consider the geometric interpretation of the computational process using the Hasse diagram, a graphical representation of a partially ordered set [30]. Each element of the set is represented by a node. A line (edge) exists between each pair of nodes b and c such that b c and there is no d such that b d   c , i.e., we say that c covers b . In combinatorial space, the control state q 0 may happen at latter moments when our program unexpectedly offloads from the physical memory and after a while the program activates as if it runs from the start (cold start). Another way to start is a warm start when the system is trying to continue the computing process from level 1 or 2. Further, we propose that any such event should be restored as a warm start (restart) and we can treat it as one additional page fault, which we will take into account in additional expressions (1) and (2) for functionals of main and auxiliary problems.
Let the set of control states q be denoted as Q . When the set Q is formed, we have to find the subset of Q , which we denote as Q ^ , and which will be useful for us under the constraint (c). Any element q ^ Q ^ has the property; namely, in the Q , there is no element q such as q ^ q . Conceptually, looking at the Hasse diagram, as shown in Figure 3, the element q ^ is a node which is a peak-node under any random walk path over nodes of the Hasse diagram. Following our approach, for any sequence of control state already from Q along the axis t with fixed θ D is a corresponding random walk path over nodes of the combinatorial space. Thus, the computational process is a random walk path through the nodes of the combinatorial space, for which we use the Hasse diagram.
A Hasse diagram is a two-pole combinatorial space with a number of blocks n = 6 , n + 1 , and several levels. The down pole, located at the zero level 0 , corresponds to the empty set q 0 = for starting any process. The upper pole corresponds to a number of blocks of the program code, i.e., n = 6 . Elements of the set Q correspond to appropriate nodes of the combinatorial space. Any control state q = ( i 1 , i 2 , , i m ) Q that corresponds to the intermediate node ( i 1 , i 2 , , i m ) . For example, at level l ( 1 l < n ) , is an ordered record, such that i 1 < i 2 < < i m and connects with m nodes of the level l 1 and with n m nodes of the level l + 1 . Also, node (2,3) of the level 2, connects with two nodes at level 1, namely, (2) and (3), and connects also with four nodes at level 3, namely, (1,2,3), (2,3,4), (2,3,5), and (2,3,6) Among nodes of the singled-out path, black nodes are (2,3), (1,2,4), (1,3,4,5), and (3,6) and are needed for us to optimize nodes q ¯ , which are down nodes of the edges ( q , ¯ q ^ ) , namely, (2,4), (1,4,5), and (3).
Figure 4 indicates a dedicated random walk path along nodes under k = 4 , that correspond to a working set and singled-out path with q ^ and q ¯ nodes on it. Eventually, we have a random walk path over nodes with q ^ and q ¯ . Under multiple runs nodes, q ^ and q ¯ can be changed, but, at all times, they exist in the computional process, including the final situation, when the set Q is determined. The only point to note for description of an algorithm to determine Q ^ is very simple and consists of sequentially sorting out elements of the Q and comparison to a current q . First, it takes removal of the current element q from Q . If yes, i.e., then q has to be crossed out of consideration as the candidate for q ^ . If no, then we have to continue the check of an inclusion into the next q from Q . If we cannot find such q from Q and the set Q is already exhausted then q becomes q ^ and we add q ^ into Q ^ . We repeat the process with the next elements of Q as   q until the set Q is exhausted and we form the set Q ^ . It has to be noted that any run of the program takes finite time. The construction of the set Q ^   is carried out by sequentially verifying the elements of the set Q for maximality with respect to inclusion. The theoretical computational complexity of this process is O Q 2 , since each element must be compared with all others. In practice, due to constraints on the size of the working set and the application of optimizations, the construction of Q ^ remains computationally feasible for real-world tasks.
Importantly, the Hasse diagram in our approach is not merely a visualization tool but an integral part of the algorithmic optimization. By representing the combinatorial structure of all control states as a partially ordered set, the diagram enables us to efficiently identify the subset of control states that are admissible working sets under constraint (c). In practice, these are the peak-nodes that are the largest memory-resident sets encountered along any execution path. Enforcing the memory constraint (c) only on these peak states is sufficient, since if every maximal working set satisfies the size limit, all intermediate subsets will automatically satisfy it as well. This insight, derived from the Hasse diagram’s poset structure, dramatically reduces the dimensionality of the constraint system: the full inequality system (5) collapses to a much smaller system (7) that considers only critical states. In other words, the partial order embodied in the Hasse diagram guides a systematic simplification of the constraints by revealing which working set inequalities are essential and which are redundant, thus directly contributing to the optimization process and not just to visualization. Furthermore, the Hasse diagram provides a framework for interpreting and tracking the random walk paths of program execution through the state space. Each node-to-node transition in the diagram corresponds to adding or removing a block in the working set, and we assign a weight to each such edge indicating whether this transition incurs a page fault. This edge-weighting scheme allows us to map the occurrence of page faults directly onto the poset structure: a sequence of memory references (a control state trajectory) becomes a path on the Hasse diagram whose cumulative edge weight equals the total number of page faults for that execution. By exploiting this weighted poset, we construct valuation functions that aggregate page fault costs along any path, which in turn guides our optimization. In effect, the Hasse diagram acts as a computational scaffold for the algorithm—it organizes the state space so that page fault events can be counted and optimized via edge weights, facilitating the application of discrete optimization to find near-optimal solutions. Thus, the Hasse diagram supports every step of our approach: it encodes the state space structure, restricts the search to feasible and critical states, reduces the number of constraints, and provides a basis for calculating and minimizing page faults via the weighted edges and valuation function.

2.2. Functionals F 0 x and F h x . Constraints for x X

In this subsection, we describe functionals and constrains of the main and auxiliary problems. We will start by finding expressions of the functional of both the main and auxiliary problem and expressions for corresponding constrains for matrix x = x r i p × n . Also, it is useful to determine the connection between the calculation that will be conducted and its geometric interpretation. Using information given above, we introduce a random variable ξ q i which is a number of references to block i under the execution of control state q for one run of the program. Let random variable ξ q i j be the same as ξ q i but in the j-th run of the program j = 1 , 2 , , h . Let expected value of ξ q i and ξ q i j be
E ( ξ q i ) = E ( ξ q i j ) = E q i , j = 1 , 2 , , h .
and a mean value:
E q i h = 1 / h j = 1 h ξ q i j ,   for   any   q Q ,   i = 1 , 2 , , n .
Calculation of the δ q i x can be conducted in the following way:
δ q i x = 0 ,          if   block   i S   R q , x 1 ,      otherwise                 
In other words, the value δ q i x = 0 corresponds to the absence of the page fault under event q i , the value δ q i x = 0 , if block i belongs to some page S from   R q , x . Otherwise, the value δ q i x = 1 corresponds to the page fault. If block i q , then it has to be δ q i x 0 for any x X . Next, we can remove the control state q 0 from   Q , then a total number of page faults for one run of the program will be
ξ = q Q . i = 1 n ξ q i · δ q i x + i = 1 n ξ q 0 i
and for the functional of the main problem which has to be minimized, we have
F 0 x = q Q . i = 1 n E q i · δ q i x + i = 1 n E q 0 i min x X
It is worth noting that in the expression for ξ the any value ξ q i does not depend on matrix x X and, quite the opposite, the function δ q i x depends on given q Q and i and x X and does not depend on random event q i and where it happens. For the functional F h x of the auxiliary problem, it holds that
F h x = q Q . i = 1 n E q i h δ q i x + i = 1 n E q 0 i h min x X
It is interesting to note that value E q i h from (2) can be assigned to the edge that connects the node q and the node q i in the Hasse diagram, where the function δ q i x = 1 and otherwise. This edge has to be weighted as zero if the function δ q i x = 0 . It may help to calculate the value of the functional F h x for fixed x X . It will be sufficient to determine whether the weight of any edge in question is 1 or 0, which means whether a page error has occurred or not. The system of constraints (a)–(c) setting the set of X of admissible solutions for both the main problem (1) and for the auxiliary problem (2) registers in the form:
i = 1 n l i x r i v r     , r = 1 , 2 , , p ;
r = 1 p x r i = 1 ,     i = 1 , 2 , , n ;
                    r = 1 p v r H q r x N q     , q Q ;
                    x r i 0 , 1 ,   r = 1 , 2 , , p ; i = 1 , 2 , , n .
where in (5), the value v r is the length of page r ,   r = 1 , 2 , , p . The system (3)–(6) contains p + n + Q non-trivial correlations. Note that constraints (3)–(5) correspond to constraints (a)–(c), respectively. The function H q r x :   H q r x = 1 , if page S r R q , x and H q r x = 0 , otherwise, i.e., the function H q r x is the characteristic function of the R q , x . Under given q and r , it is easy to calculate H q r x via elements of the matrix x , namely, if q = ( i 1 , i 2 , , i m q ) Q then H q r x = m a x 1 j m q x r i j .

2.3. Reduction in a Number of Inequalities of the Control State in Working Set R q , x

Constraint (5) contains |Q| inequalities and there are probably a lot. Here is an opportunity to reduce essentially a number of inequalities in (5). As already mentioned, from a practical point of view, we may propose that there exists a system constant, let it be N , which limits the dimension of any working set R q , x   and which is known in advance. It is necessary to note the set Q ^ and then we can substitute system (5) for
r = 1 p v r H q ^ r x N ,     q ^ Q ^
but first we must put in (5) for all N q = N , q Q . Let R q , x be the length of the working set R q , x ,   q Q , and x X . To give a ground for substitution, it is worth paying attention to Figure 3 and Figure 4 with black nodes on them corresponding to control state q ^ Q ^ and nodes   q Q , such as q q ^ . Then, the next correlations hold as follows: if q q ^ then R q , x R q ^ , x and R q , x ,   where q Q , q ^ Q ^ and if inequality (7) holds for some q ^ Q ^ then it also holds for any q Q which q q ^ . Here, it is taken into account that any q Q belongs to at least one q ^ from Q ^ as shown in Figure 4, node (3). Evidently, the set X of admissible solutions is non-empty since an initial distribution block i 1 , i 2 , , i n over pages S g 1 , S g 2 , , S g p satisfies constraints (3)–(6).
The approach proposed in this work is based on a mathematical model that incorporates dynamical size pages, as reflected in the system of constraints (3), where the length of each page v r may differ. This model is more general and flexible, allowing for the representation of both variable and fixed page sizes. In the case of systems with fixed page sizes, the model reduces to a special case with identical values v r = v for all pages r . All derived results, including the formalization of working sets, functionals, and constraints, remain applicable, while the computational component is simplified due to the uniformity of page sizes. Thus, the proposed method is universal and applicable to systems with both variable and fixed page sizes, which broadens its scope of practical applicability and ensures its relevance to the majority of traditional virtual memory systems.

3. Results

The nonlinear model of the reorganization of the program code which is constructed above contains nonlinear functional (1) and/or (2) and both linear inequalities (3) and (4) and a nonlinear system of constraints in (5). The power of the set Q ^ in (7) is not too large in contrast with the power of Q in (5). The constraints (5) and (7) show instead of controlling a size of any R q , x with totally Q inequalities, and after substitution (7) instead of (5) we have in (7) only Q ^ inequalities. As for functional (1) or (2), it can be reduced to a number of addends in (1) or (2) on the basis of the idea that if block   i q then δ q i x 0 for any x X and the second sum in (1) or (2) has instead of i = 1 , 2 , , n , only the indexes i I q , where the set I q does not contain such i belongs to q . Under given q = ( i 1 , i 2 , , i m q ) and i and under the event q i , if i 1 ( i 2 , i 3 , , i m q ) and i ( i 2 , i 3 , , i m q ) then there will be no page fault. If i 1 ( i 2 , i 3 , , i m q ) and i ( i 2 , i 3 , , i m q ) then there will also be no page fault. It is important to note that some methods of discrete optimization, based on construction of valuation function, in our case, for the problem (2) on the basis of geometric interpretation of the computational process, it is the lower valuation function, which is written on the left side of (8):
q ^ Q ^ q ¯ Q q ^ E q ¯ i q ^ q ¯ h δ q ¯ i q ^ q ¯ x q Q . i = 1 n E q i h · δ q i x
To solve the problem with valuation function, which has been written also in (9):
q ^ Q ^ q ¯ Q q ^ E q ¯ i q ^ q ¯ h δ q ¯ i q ^ q ¯ x min x X  
then it gives the opportunity, with appropriate complexity, to get an exact (optimal) solution of the problem (2) with a functional on the right side of (8). The set Q q ^ on the left part of (8) is a subset of Q , which is defined by a separate q ^ and consists of a number of q ¯ Q . If we look to Figure 4, a node q ¯ has to be connected with node q ^ by the edge, i.e., ( q ¯ , q ^ ) which is the oriented edge with nodes q ¯ and q ^ . Meanwhile, it is not necessary to take into account both on the left side of (8) and (9) the edge ( q ^ ,   q ¯ ) , since the weight of the ( q ^ ,   q ¯ ) equals 0. On the left side of (8) for any q ^ Q ^ , a node q ¯ is running for edge ( q ¯ , q ^ ) until the set Q q ^ is exhausted. We include a node q ¯ into Q q ^ if there is at least one reference, while h runs the program, from node q ¯ to the node q . ^ The number i on the left part of (8) is defined as q ^ q ¯ , i.e., i = q ^ q ¯ . Let it be a denotation i q ^ q ¯ . The left part of (8) contains a lesser number of addends than the right side of (8). The same we may say about the functional of the problem (1), i.e., about   F 0 x . As for the optimal solution of (1), it is interesting to point out conditions for initial data when the optimal solution of problem (2), let it be the matrix x h * , will be an ε -optimal solution of the problem (1) in sense:
P r F 0 x * F 0 x h * ε 1 η ,         ε > 0 ,     η 0 , 1
where the matrix x * is an unknown optimal solution of the problem (1). Those conditions, first of all, imply to determine common properties of the distribution laws for the variables ξ q i , i = 1 , 2 , , n ; q Q and lower bound for the number h of executions (runs) of the code, under which inequality (10) holds.
Under the known values E q i in (1), i.e., the distribution law of each random variable ξ q i , i = 1 , 2 , , n ;   q Q is known, the algorithm of the solution, both the initial problem (1), and the problem (2) can be based on valuation function and the property of the function δ q i x :
δ q ¯ i x + δ q ¯ i x δ q ¯ q ¯ i x + δ q ¯ q ¯ i x
which takes place for any q ¯ , q ¯ Q ^ Q and represents a special case of the property of supermodularity. In the case of unknown values E q i (the distribution law of the random variables ξ q i , i = 1 , 2 , , n ; q Q is unknown) the situation for solution of problem (1) becomes more complicated. In this case, the problem (2) could be used as an auxiliary problem for (1) and optimal solution of (2), i.e., x h * can be taken as the solution of (1) in the sense of the inequality (10).
To evaluate the effectiveness of the proposed approach, simulation experiments were conducted comparing the classical page replacement algorithms: WS, LRU, and FIFO. The primary comparison metrics included page fault handling time, page fault frequency, and total program execution time. The obtained results highlight the advantages of the WS algorithm under various memory configurations and workload conditions, while also demonstrating the behavior of the other algorithms in typical virtual memory scenarios. We used a heatmap technique to visualize page faults and different working set sizes as shown in Figure 5a–c, where the number of page faults with a different working set size and a fixed physical memory size are shown. Figure 5d indicates a comparison result of page faults simulation in different algorithms WS, LRU, and FIFO. Experiments were performed with different parameters: virtual memory size, physical memory size, working set size, access sequence length, locality factor. The Working Set algorithm tries to keep only actively used pages in memory. If the working set size is smaller than the physical memory size, the number of page faults will be low; if working set size increases, then page faults also increase accordingly. The LRU algorithm unloads the page that has not been used for the longest time. It performs well, especially when there is locality in the order of accesses.
The FIFO algorithm unloads the page that was loaded first. It shows worse results compared to LRU and the working set, especially in the presence of cyclic access patterns. The simulation results on page fault optimization show that in the fixed size of physical memory, there are fewer page faults for all algorithms. This is because more pages can be loaded simultaneously, reducing the need for paging. When the physical memory size is small, the number of faults increases because the algorithms are forced to offload pages more frequently. The heatmap shows that for a given algorithm and physical memory size, if the cell color is light, it indicates a large number of page faults. For example, an FIFO with a small physical memory size may have a high number of page faults. If the cell color is dark, it indicates a low number of page faults. For example, an LRU with a large physical memory size may have low page fault values. Table 1 presents that Working Set is the most effective when there is locality in the order of accesses, especially when the physical memory size is large. LRU performs consistently well and is a compromise between implementation complexity and efficiency. This algorithm effectively minimizes the number of page faults. FIFO is the least efficient, especially when the physical memory size is small, because it does not consider locality and may discard pages that will be used soon.
We evaluated WS, LRU, and FIFO caching algorithms with a cache size of 5% allocated from free memory. Each of the 10 nodes performed 100,000 operations, with 80% accessing local memory and 20% requesting remote data. Performance was assessed through average page fault time, page fault frequency, and total execution time. Table 2 shows the results and indicates that the WS algorithm consistently outperformed the alternatives. WS reduced the average page fault time to 6 µs and the fault frequency to 35%, yielding a total execution time of 13 s. LRU and FIFO showed similar performance, with average fault times of 7 µs and frequencies around 38%, resulting in execution times of 14.5 s. Random Replacement performed the worst, with a fault time of 8 µs, a frequency of 40%, and an execution time of 15 s.
The superior performance of WS likely stems from its ability to maintain a set of actively used pages, adapting even to random access patterns. LRU and FIFO, while effective in workloads with locality, offer little advantage here, and RR’s lack of pattern consideration makes it the least efficient. We varied the cache size—1%, 5%, 10%, and 20% of total memory—using the Working Set algorithm, sourcing memory from both free reserves and data memory (overflow). The same workload and system configuration were applied. As cache size increased from 1% to 10%, page fault frequency dropped from 39% to 33%, and total execution time decreased from 14.8 s to 12.8 s as shown in Figure 6a.
Beyond 10%, at 20%, the frequency stabilized at 32%, and execution time plateaued at 12.7 s, indicating diminishing returns. Average page fault time remained steady at around 6 µs across all sizes. With data memory caching, fault frequency decreased from 38% at 1% to 30% at 10%, with execution time improving from 14.5 s to 13.5 s. However, at 20%, execution time rose to 14 s despite a frequency of 29%, as reduced data memory led to more frequent evictions and reloads as shown in Figure 6b. These experiments reaffirm the efficacy of RAM page caching, with the Working Set algorithm proving optimal for random workloads. The choice of algorithm significantly impacts performance, with WS reducing execution time by up to 14% compared to no caching (15 s).

4. Discussion

The results of the simulation experiments show that the locality of accesses to memory pages is a key factor influencing the effectiveness of paging algorithms. Among the evaluated methods, the WS and LRU algorithms exhibit the best performance in reducing page faults, whereas FIFO consistently performs worse across all parameters. Specifically, WS proves effective in retaining active pages in memory, and LRU leverages recent access history, while FIFO’s lack of locality awareness leads to frequent unnecessary evictions. Further analysis of cache size optimization reveals a performance sweet spot around 10% of total memory for both free and overflow memory caching. Increasing the cache size beyond this threshold produces diminishing returns in terms of page fault reduction. Overflow caching, in particular, introduces trade-offs due to reduced available space for data. These observations highlight the need for careful memory allocation policies that balance caching capacity with overall data availability. Although the proposed approach is grounded in a solid theoretical foundation, employing the working set model and a geometric interpretation of the computational process state, our work is not limited to purely theoretical analysis. The proposed model formulates the minimization of page faults as a discrete optimization problem with a set of functionals and constraints, thereby providing a basis for the development of a novel page replacement algorithm. In particular, by utilizing the Hasse diagram and reducing the state space to the critical nodes Q ^ , it becomes possible to construct a computationally efficient algorithm that approximates the optimal solution with a controlled error ε . Thus, the model serves not only as a theoretical tool but also as a foundation for the practical implementation of adaptive and intelligent virtual memory management algorithms that surpass traditional heuristics in quality. Future work will focus on the development and deployment of such an algorithm based on the proposed model, along with performance evaluation under real-world conditions.
It is worth noting that Clock and WSClock, which are practical approximations of LRU and WS, respectively, were not included in this study. However, due to their real-world relevance and lower implementation overhead, we plan to include them in future evaluations to better position our approach within a broader spectrum of paging strategies. Unlike traditional algorithms that make runtime-local decisions, our method formulates page replacement as a discrete optimization problem, minimizing the expected number of page faults across program runs. We introduce a valuation function, combined with Hasse diagram-based reduction of the search space to a core subset of control states. This allows for efficient exploration of ε-optimal configurations under real system constraints. In doing so, our work lays a theoretical and practical foundation for designing more adaptive and intelligent page replacement mechanisms beyond reactive heuristics.

5. Conclusions

Poor code locality encourages the study of program restructuring and optimization strategies, such as the Working Set strategy, to minimize page faults in virtual memory systems. This paper proposes a combinatorial approach to optimize working set size, and aims at obtaining near-optimal solutions using geometric interpretation and functional constraints. In order to optimize the calculations, a valuation function was found which includes the empirical average of the experiments and a system of constraints. Our proposed geometric interpretation of the computational process in the form of a Hasse diagram helps to reduce the dimensionality of the page faults minimization problem. The approach outlined in the paper provides a basis for finding the optimal solution to the main problem, i.e., to page faults minimization, if the minimum distribution of random variables is known in advance. Otherwise, with an unknown minimum of the random variable distribution, the constructed approach provides the basis for finding an accurate solution to the auxiliary problem, i.e., the experimental average of page faults.
The simulation experiments conducted on page replacement algorithms demonstrated the practical significance and confirmed the effectiveness of the model based on the Working Set strategy and the geometric interpretation of the computational process state. Our results show that algorithms oriented toward the active working set of pages outperform those that disregard such information in terms of both the number of page faults and execution time. This aligns with the theoretical assumptions of the model, where the minimization of page faults is achieved through the optimization of working sets, as reflected in the model’s functionals and constraints. Thus, the results section not only illustrates the overall performance patterns of classical algorithms but also provides evidence of the relevance and practical applicability of the proposed mathematical model, thereby bridging the gap between theory and practice in virtual memory management.
Currently, the Working Set strategy is usually used as a theoretical research base, either for comparison or for auxiliary purposes, as it is considered expensive to implement. However, in our case, if the program code has a block structure, then the results obtained can be used to build a fast, accessible, and affordable swap algorithm.

Author Contributions

Conceptualization, A.M., G.B. and G.A.; methodology, A.M.; software, A.M. and A.B.; validation, A.M., A.B. and G.B.; formal analysis, G.B., A.M.; investigation, G.B.; resources, A.M.; data curation, A.B. and G.A.; writing—original draft preparation, A.M., G.A. and G.B.; writing—review and editing, A.M. and G.B.; visualization, A.M. and A.B.; supervision, A.M.; project administration, A.M.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a grant from the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan, grant number “AP19174930—Research and development of model for programs reorganization and data in segment-page systems based on two-level dictionary and geometric interpretation”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data can certainly be provided upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Allen, T.; Cooper, B.; Ge, R. Fine-Grain Quantitative Analysis of Demand Paging in Unified Virtual Memory. ACM Trans. Archit. Code Optim. 2024, 21, 1–24. [Google Scholar] [CrossRef]
  2. Chen, Y.-C.; Wu, C.-F.; Chang, Y.-H.; Kuo, T.-W. Exploring Synchronous Page Fault Handling. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2022, 41, 3791–3802. [Google Scholar] [CrossRef]
  3. Doron-Arad, I.; Naor, J. Non-Linear Paging. In Proceedings of the 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), Tallinn, Estonia, 8–12 July 2024; Volume 297, pp. 57:1–57:19. [Google Scholar] [CrossRef]
  4. Wood, C.; Fernandez, E.B.; Lang, T. Minimization of Demand Paging for the LRU Stack Model of Program Behavior. Inf. Process. Lett. 1983, 16, 99–104. [Google Scholar] [CrossRef]
  5. Dyusembaev, A.E. On one approach to the problem of segmenting programs. Dokl. Akad. Nauk. 1993, 329, 712–723. [Google Scholar]
  6. Borankulova, G.; Murzakhmetov, A.; Tungatarova, A.; Taszhurekova, Z. Adaptive Working Set Model for Memory Management and Epidemic Control: A Unified Approach. Computation 2025, 13, 190. [Google Scholar] [CrossRef]
  7. Ngetich, M.K.Y.; Otieno, C.; Kimwele, M.; Gitahi, S. Advancements in Code Restructuring: Enhancing System Quality through Object-Oriented Coding Practices. In Proceedings of the 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES), Nairobi, Kenya, 26–28 July 2023; IEEE: Nairobi, Kenya, 2023; pp. 000125–000130. [Google Scholar]
  8. Yegon Ngetich, M.K.; Otieno, D.C.; Kimwele, D.M. A Model for Code Restructuring, A Tool for Improving Systems Quality in Compliance with Object Oriented Coding Practice. IJCATR 2019, 8, 196–200. [Google Scholar] [CrossRef]
  9. Peachey, J.B.; Bunt, R.B.; Colbourn, C.J. Some Empirical Observations on Program Behavior with Applications to Program Restructuring. IIEEE Trans. Softw. Eng. 1985, SE-11, 188–193. [Google Scholar] [CrossRef]
  10. Roberts, D.B. Practical Analysis for Refactoring. Ph.D. Thesis, University of Illinois, Champaign, IL, USA, 1999. [Google Scholar]
  11. Cedrim, D.; Garcia, A.; Mongiovi, M.; Gheyi, R.; Sousa, L.; De Mello, R.; Fonseca, B.; Ribeiro, M.; Chávez, A. Understanding the Impact of Refactoring on Smells: A Longitudinal Study of 23 Software Projects. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, Paderborn, Germany, 4–8 September 2017; ACM: Paderborn, Germany, 2017; pp. 465–475. [Google Scholar]
  12. Agnihotri, M.; Chug, A. Severity Factor (SF): An Aid to Developers for Application of Refactoring Operations to Improve Software Quality. J. Softw. Evolu Process 2024, 36, e2590. [Google Scholar] [CrossRef]
  13. Agnihotri, M.; Chug, A. Understanding the Effect of Batch Refactoring on Software Quality. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 2328–2336. [Google Scholar] [CrossRef]
  14. Coelho, F.; Massoni, T.; Alves, E.L.G. Refactoring-Aware Code Review: A Systematic Mapping Study. In Proceedings of the 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR), Montreal, QC, Canada, 28 May 2019; pp. 63–66. [Google Scholar]
  15. Arasteh, B.; Ghanbarzadeh, R.; Gharehchopogh, F.S.; Hosseinalipour, A. Generating the Structural Graph-based Model from a Program Source-code Using Chaotic Forrest Optimization Algorithm. Expert. Syst. 2023, 40, e13228. [Google Scholar] [CrossRef]
  16. Arasteh, B.; Abdi, M.; Bouyer, A. Program Source Code Comprehension by Module Clustering Using Combination of Discretized Gray Wolf and Genetic Algorithms. Adv. Eng. Softw. 2022, 173, 103252. [Google Scholar] [CrossRef]
  17. Dyusembaev, A.E. Mathematical Models of Program Segmentation; Fizmatlit/Nauka: Moscow, Russia, 2001. [Google Scholar]
  18. Sadirmekova, Z.; Murzakhmetov, A.; Abduvalova, A.; Altynbekova, Z.; Makhatova, V.; Akhmetzhanova, S.; Tasbolatuly, N.; Serikbayeva, S. Approach to Automating the Construction and Completion of Ontologies in a Scientific Subject Field. Int. J. Electr. Comput. Eng. 2024, 14, 3064–3072. [Google Scholar] [CrossRef]
  19. Denning, P.J. The Working Set Model for Program Behavior. In Proceedings of the ACM symposium on Operating System Principles–SOSP ’67, Gatlinburg, TN, USA, 1–4 October 1967; ACM Press: New York, NY, USA, 1967; pp. 15.1–15.12. [Google Scholar]
  20. Park, Y.; Bahn, H. A Working-Set Sensitive Page Replacement Policy for PCM-Based Swap Systems. J. Semicond. Technol. Sci. 2017, 17, 7–14. [Google Scholar] [CrossRef]
  21. Sha, S.; Hu, J.-Y.; Luo, Y.-W.; Wang, X.-L.; Wang, Z. Huge Page Friendly Virtualized Memory Management. J. Comput. Sci. Technol. 2020, 35, 433–452. [Google Scholar] [CrossRef]
  22. Hu, J.; Bai, X.; Sha, S.; Luo, Y.; Wang, X.; Wang, Z. Working Set Size Estimation with Hugepages in Virtualization. In Proceedings of the 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Melbourne, VIC, Australia, 11–13 December 2018; IEEE: Melbourne, VIC, Australia, 2018; pp. 501–508. [Google Scholar]
  23. Nitu, V.; Kocharyan, A.; Yaya, H.; Tchana, A.; Hagimont, D.; Astsatryan, H. Working Set Size Estimation Techniques in Virtualized Environments: One Size Does Not Fit All. In Proceedings of the Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, Irvine, CA, USA, 18–22 June 2018; ACM: Irvine, CA, USA, 2018; pp. 62–63. [Google Scholar]
  24. Verbart, A.; Stolpe, M. A Working-Set Approach for Sizing Optimization of Frame-Structures Subjected to Time-Dependent Constraints. Struct. Multidisc Optim. 2018, 58, 1367–1382. [Google Scholar] [CrossRef]
  25. Dyusembaev, A.E. On the Correctness of Algebraic Closures of Recognition Algorithms of the “Tests” Type. USSR Comput. Math. Math. Phys. 1982, 22, 217–226. [Google Scholar] [CrossRef]
  26. Sadirmekova, Z.; Sambetbayeva, M.; Serikbayeva, S.; Borankulova, G.; Yerimbetova, A.; Murzakhmetov, A. Development of an Intelligent Information Resource Model Based on Modern Natural Language Processing Methods. Int. J. Electr. Comput. Eng. 2023, 13, 5314. [Google Scholar] [CrossRef]
  27. Pâris, J.-F.; Ferrari, D. An Analytical Study of Strategy-Oriented Restructuring Algorithms. Perform. Eval. 1984, 4, 117–132. [Google Scholar] [CrossRef][Green Version]
  28. Denning, P.J. Working Set Analytics. ACM Comput. Surv. 2021, 53, 1–36. [Google Scholar] [CrossRef]
  29. Dyusembaev, A.E. Correct models of program segmenting. J. Pattern Recognit. Image 1993, 3, 187–204. [Google Scholar]
  30. Mrena, M.; Kvassay, M. Generating Monotone Boolean Functions Using Hasse Diagram. In Proceedings of the 2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Dortmund, Germany, 7–9 September 2023; IEEE: Dortmund, Germany, 2023; pp. 793–797. [Google Scholar]
Figure 1. Structure of the virtual and physical memory. The swapping process between virtual and physical memory in a paging system. Virtual memory consists of program blocks as subroutines and data segments that are distributed across multiple pages. Physical memory holds a subset of these pages, reorganized for execution. The control state includes frequently referenced blocks, marked as circles, which form the working set. This structure highlights how poor locality can lead to an increased number of page faults.
Figure 1. Structure of the virtual and physical memory. The swapping process between virtual and physical memory in a paging system. Virtual memory consists of program blocks as subroutines and data segments that are distributed across multiple pages. Physical memory holds a subset of these pages, reorganized for execution. The control state includes frequently referenced blocks, marked as circles, which form the working set. This structure highlights how poor locality can lead to an increased number of page faults.
Information 16 00829 g001
Figure 2. Boolean matrix.
Figure 2. Boolean matrix.
Information 16 00829 g002
Figure 3. Hasse diagram of the combinatorial control state space. The diagram represents the partially ordered set of all possible control states using a Hasse structure. Each node corresponds to a working set of blocks, and edges represent valid transitions based on block additions or removals. The highlighted black nodes are maximal working sets subject to constraint (c) and form the reduced constraint set. These nodes are critical for evaluating and minimizing the number of page faults in the optimization process.
Figure 3. Hasse diagram of the combinatorial control state space. The diagram represents the partially ordered set of all possible control states using a Hasse structure. Each node corresponds to a working set of blocks, and edges represent valid transitions based on block additions or removals. The highlighted black nodes are maximal working sets subject to constraint (c) and form the reduced constraint set. These nodes are critical for evaluating and minimizing the number of page faults in the optimization process.
Information 16 00829 g003
Figure 4. Random walk path across control states in the Hasse diagram. This figure shows an example of a control state trajectory as a random walk over the combinatorial space. Each path corresponds to one run of a program, transitioning between working sets via memory accesses. The black nodes on the path indicate page fault—critical states used to reduce constraint dimensionality. This walk determines which states are visited frequently and helps to construct the valuation function for optimization.
Figure 4. Random walk path across control states in the Hasse diagram. This figure shows an example of a control state trajectory as a random walk over the combinatorial space. Each path corresponds to one run of a program, transitioning between working sets via memory accesses. The black nodes on the path indicate page fault—critical states used to reduce constraint dimensionality. This walk determines which states are visited frequently and helps to construct the valuation function for optimization.
Information 16 00829 g004
Figure 5. Comparing simulation results of page faults with different working set sizes in (ac) and algorithms in (d).
Figure 5. Comparing simulation results of page faults with different working set sizes in (ac) and algorithms in (d).
Information 16 00829 g005
Figure 6. (a) Execution time vs. cache size (free memory); (b) execution time vs. cache size (data memory).
Figure 6. (a) Execution time vs. cache size (free memory); (b) execution time vs. cache size (data memory).
Information 16 00829 g006
Table 1. Average page faults for different algorithms.
Table 1. Average page faults for different algorithms.
AlgorithmAverage Page FaultsAverage Locality Factor
Working Set43520.72
LRU43710.72
FIFO43970.72
Table 2. Performance metrics for caching algorithms.
Table 2. Performance metrics for caching algorithms.
AlgorithmAv. Page Fault Time (µs)Page Fault Frequency (%)Total Execution Time (s)
WS63513
LRU73814.5
FIFO73814.5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Murzakhmetov, A.; Borankulova, G.; Bapanov, A.; Altybayev, G. Mathematical Analysis of Page Fault Minimization for Virtual Memory Systems Using Working Set Strategy. Information 2025, 16, 829. https://doi.org/10.3390/info16100829

AMA Style

Murzakhmetov A, Borankulova G, Bapanov A, Altybayev G. Mathematical Analysis of Page Fault Minimization for Virtual Memory Systems Using Working Set Strategy. Information. 2025; 16(10):829. https://doi.org/10.3390/info16100829

Chicago/Turabian Style

Murzakhmetov, Aslanbek, Gaukhar Borankulova, Arseniy Bapanov, and Gabit Altybayev. 2025. "Mathematical Analysis of Page Fault Minimization for Virtual Memory Systems Using Working Set Strategy" Information 16, no. 10: 829. https://doi.org/10.3390/info16100829

APA Style

Murzakhmetov, A., Borankulova, G., Bapanov, A., & Altybayev, G. (2025). Mathematical Analysis of Page Fault Minimization for Virtual Memory Systems Using Working Set Strategy. Information, 16(10), 829. https://doi.org/10.3390/info16100829

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop