1. Introduction
Generating partitions of a positive integer by expressing it as a sum of integers is a fundamental problem with wide-ranging applications in mathematics and computation [
1,
2,
3,
4]. From a theoretical point of view, there is a large amount of research and hypotheses connected to partitions of integers. Algorithms for generating and counting the distinct partitions of a given positive integer
n have been studied for many years and these problems are included in classical books for generation of basic combinatorial objects such as [
5,
6].
The motivation for our study is rooted within the context of constructive coding theory. The problem of integer partition naturally emerges in the generation of combinatorial configurations that are relevant to the structure and analysis of codes. This work is part of a broader research project dedicated to the parallel implementation of algorithms in constructive coding theory. Within this framework, we have already developed and optimized parallel methods for generating Gray codes, computing the Algebraic Normal Form (ANF) of Boolean functions [
7], and performing vectorized operations over finite fields [
8]. The generation and analysis of integer partitions arises naturally in this context, particularly when constructing or classifying combinatorial structures. Given the combinatorial explosion inherent in partition-related problems, efficient parallel algorithms are crucial for handling large-scale instances. We have developed two parallel implementations of algorithms that generate distinct partitions of integer
n with the parallel programming interface OpenMP (Open Multi-Processing), where each computational unit generates multiple partitions that can be used in further computations. As a case study, we compute the initial terms of the sequence defined by the number of partitions of
n such that the sum of the cubes of the parts equals
, a problem that requires enumerating and filtering large numbers of partitions under nonlinear constraints.
There are a few main directions connected to algorithms for integer partition:
Generating all partitions of n into positive integers, including both distinct and repeated summands. The distinct partitions can be generated in ascending, descending and minimal change order.
Generating partitions with restrictions such as restricting the number of summands, restricting the minimal or maximal value of the summands and others. Some problems involving integer partitions may also require adaptive constraints and/or combinations of heterogeneous restrictions, depending on the application context.
The order in which the partitions are generated is one of the important steps in the generation algorithm. For some problems, a minimal change generation ordering (Gray code) [
9] is more appropriate. For our purposes, we focus on lexicographic and reverse lexicographic ordering of the partitions. Having an ordering for the target objects, we can also introduce an enumeration for the set of object.
Ranking and
unranking functions are used for this purpose. Such functions are given for the basic combinatorial objects (permutation, subsets, etc.) and are also considered for both lexicographic and reverse lexicographic orderings.
Generating the next partition in the set of ordered partitions for a given current one. Similarly to the ranking and unranking functions, this is also an important algorithm and is typically given for basic combinatorial objects.
Integer partitions can also be viewed as solutions to certain Diophantine equations. In some special cases, they lead to some interesting integer sequences. It is known that the sum of the first
n cubes is equal to the square of the sum of the first
n integers
. In [
10], the authors have considered the sets
, where
. It has been proven that for every natural
n there is exactly one such set that contains distinct integer elements. Integer partitions are also connected to some integer sequences, described in OEIS [
11]. Such sequences are A000607 and A000537, which give the number of partitions of n into prime parts and the sum of the first n cubes, respectively.
On the other hand, integer partitions are used for the generation of combinatorial objects with specific properties [
12]. Many of the studied objects can be represented as binary matrices and partitions can be used in the generation of such matrices. Usually, the constructive part is backtracking, in which the incidence matrix is expanded row by row. The number of 1’s in the row is known. And with an integer partition, it is determined in which part of the row how many 1’s there should be according to the arrangement of the fixed rows. Generating an integer partition is an element of algorithms for classification. In many cases, for each partition additional calculations are made in order to obtain objects with the desired properties. Therefore, generating partitions of the integer
n in parallel can be beneficial for classification problems for large sets of objects. In such algorithms, partition generation is a small part of the overall computations. Thus, it is more important to have a flexible algorithm that allows different restrictions. Efficiency of the partition generation is not the main focus. Other problems, however, require fast and efficient generation of partitions.
In this work, we consider the problem of generating partitions of positive integers and the following connected problems:
We present two different approaches for generating all distinct partitions—a recursive algorithm and a more efficient non-recursive algorithm. We present some characteristics of both methods.
Ranking, unranking and successor functions that are used for generating the full set of distinct partitions.
Developing parallel implementation of the considered algorithms. We present two strategies for parallelization depending on the algorithm. The first strategy uses the task construct to parallelize the recursive algorithm for generation. Another strategy is to partition the set of all ordered partitions into subsets and generate all partitions in each subset.
Evaluating the effectiveness of the presented parallel implementations. For the evaluation we consider the distribution of work between computational units, the flexibility of the algorithms and their scalability with the increase of the computational units. As a case study, we also calculate the number of partitions of n such that the sum of the cubes of the parts is equal to .
Sequential integer partitioning algorithms have been implemented in some of the well-known computer algebra system such as GAP 4.15.0, Magma V2.29, SageMath 10.7, etc. A theoretical development of a parallel algorithm is presented in the article [
13], but the existence of an actual implementation is unknown to us. The presented parallel algorithm is conceptual and differs significantly from the algorithms presented here.
The current paper is structured as follows: in
Section 2 we give a mathematical background on the problem of integer partitions.
Section 3 presents the parallel programming model, that is used. In
Section 4, two sequential algorithms for generating integer distributions and ranking and unranking functions for the two main considered orderings are given.
Section 5 presents the parallel implementations of the algorithms. Computational results are presented in
Section 6. Finally, some conclusion remarks are given in
Section 7.
2. Mathematical Background
A partition of integer
n is a collection of integers such that their sum is equal to
n. The summands are also called
parts and in the general case their order can be ignored. We denote the set of all distinct partitions of
n with
and their number by
. Two popular ways to represent a partition are as a sequence
or by using the multiplicity of the summands. In this work we consider sequence representation. We can order the summands in the sequence. The partition
of
n is said to be in increasing order if
and in decreasing order if
. For example,
is a partition of
written in decreasing order and
is the same partition in increasing order. For the given orderings of the summands in a partition, we also consider orderings for the partitions themselves. Two natural ways to order the set of partitions
are lexicographic order, where the parts are listed in increasing order, and reverse lexicographic order, where the parts are listed in decreasing order. We denote the set of partitions in lexicographic order by
and when the set is in reverse lexicographic order, we denote it by
.
Table 1 gives an example of
and
for
. In the table we have two sets of columns, the first of which is the rank. The second column lists the partitions in lexicographic order, and the third column gives them in reverse lexicographic order. Most generation algorithms obtain the set of all partitions in reverse lexicographic order as sequences of integers. Some algorithms for generation of integer partitions are given in [
6,
14,
15,
16,
17].
There are two fundamental problems connected to partition–calculating the total number
of distinct partitions and generating the set
of all distinct partitions. These two problems have been the focus of research through the years due to their many applications [
4,
18,
19,
20,
21]. Some asymptotic estimates on
can be found in [
22,
23]. The values of
also form the integer sequence A000041, given in [
11]. Many of the characteristics of
are given there.
Another way to represent a partition is through the Ferrer-Young diagram given by an array of dots. For a given partition
written in decreasing order, the Ferrer-Young diagram is an array of dots, where the
i-th row contains exactly
dots and the row of dots is left-justified. For example, let’s consider the partition
for
. Then,
D is the Ferrer-Young diagram, where:
By reflecting the diagram across its main diagonal, we get the conjugate diagram
, and the corresponding partition is called a conjugate partition. For example, the conjugate partition of
is
for
. Ferrer-Young diagrams have many other applications, some of which are given in [
24,
25].
Some problems use partitions of an integer
n that have some restrictions. One such restriction is having a fixed number of parts. Let us denote the set of all partitions of
n having exactly
k parts as
and the number of such partitions as
. The value of
can be calculated by using
as follows:
This equation and algorithms for generating
in reverse lexicographic order are shown in [
6]. It is proven that the number of partitions having exactly
k parts is equal to the number of partitions, where the largest part is exactly
k. This can be seen in the Ferrer-Young diagrams. Therefore, we can generate the set of partitions with the largest part
k using the conjugate partitions. Algorithm for generating conjugate partitions is given in [
6]. For the value of
and reverse lexicographic order, we have the following recurrence relation, also given in [
6]:
Let us consider the set
that consists of the partitions in
with smallest part at least
k and the set
that consists of the partitions of
with largest part at most
k, respectively. Analogously,
and
give the number of partitions in the corresponding sets. For example, for
and
, if we look at
Table 1, we have
and
. For the values of
and
we have recurrence relations given by Lemma 1 and Lemma 2. These recurrence relations are analogous to (
2). For
and
we also have
.
Lemma 1. Let n and k be positive integers. Then Proof. Obviously, the summands in any partition of n are and therefore when . The only partition with the smallest summand equal to n is the partition with only one part.
Let
. We consider the set
as a union of the subsets
with the smallest part equal to
k and
. We need to prove that
. Let us consider the map
defined by
where
is the number of summands in the current partition and all partitions are ordered lexicographically (
). Then
and therefore
is a partition of
. Moreover,
, hence
. This proves that
is an injection from
to
. Furthermore, if
, then
, and so
is a surjection. It follows that
is a bijection from
to
and therefore these two sets have the same cardinality. This proves that in this case
. □
Lemma 2. Let and be integers. Then Proof. The proof is similar to the proof of the Lemma 1. The difference is only that instead , we use , where the partitions are in reverse lexicographic order. The number of the partitions with largest part at most n or more is equal to the number of all partitions of n. There are no partitions with a zero summand and therefore .
Let us consider the case
. Then
is a union of the subsets
with the biggest part equal to
k and
. We now consider the map
defined by
where
is the number of summands in the current partition. The map
is injective (different partitions in
have different images) and surjective (any partition in
is an image of a partition from
), hence it is a bijection from
to
and therefore these two sets have the same cardinality. This proves that in this case
. □
For basic combinatorial objects, there are also ranking, unranking and successor functions in addition to generation. A ranking algorithm gives the position (rank) of an object in the set of all objects with respect to a given order, while unranking algorithms obtain the object corresponding to a given rank. A successor function obtains the next object in a chosen order for a given current object. These algorithms can sometimes be used for more efficient generation. Unranking functions in particular can be used for generating objects in parallel. Ranking and unranking for the elements in
and
are given in [
6]. Successor algorithms for
and
are given in [
6,
14].
3. Parallel Programming Model
OpenMP, short for ‘Open Multi-Processing’, is an Application Programming Interface (API) designed to support parallel programming in shared-memory environments using C, C++, and Fortran [
26]. It enables developers to write parallel code that executes concurrently across multiple processors, cores, or threads within a single system. OpenMP simplifies the parallelization process by providing a set of compiler directives and library routines, allowing developers to define parallel regions in their code without managing low-level threading details. This abstraction makes OpenMP an accessible and efficient tool for developing high-performance parallel applications.
OpenMP operates on a shared-memory model, where threads can access both private and shared variables. Work-sharing constructs, such as parallel and for, efficiently distribute tasks among threads, optimizing resource utilization. Renowned for its portability, OpenMP is compatible with various compilers across platforms, making it a versatile tool for parallel programming.
Its simplicity and incremental integration make OpenMP particularly accessible, attracting a broad range of developers, especially in scientific and engineering domains. By leveraging parallelism, OpenMP significantly boosts computational efficiency, making it widely used in applications such as numerical simulations, data analytics, and other compute-intensive tasks.
The tasking execution model (Task construct), illustrated in
Figure 1, facilitates the parallelization of irregular (unstructured) patterns and recursive function calls by defining independent units of work that are executed by a thread. Several scenarios are possible: a single creator, typically implemented using a
parallel construct followed by
single, and multiple creators, leveraging work-sharing and nested tasks to efficiently distribute the workload.
The creation of threads and tasks in OpenMP is generally performed through compiler directives and functions defined in the
omp.h header and its associated library. The concept of tasking was introduced in OpenMP 3.0, however, not all compilers support OpenMP 3.0 and its tasking capabilities. Additional information about the OpenMP programming interface can be found in [
26].
4. Sequential Implementation
In this section we present two sequential algorithms for generating all partitions of an integer n—a recursive generation and a non-recursive generation using the successor function. We also present ranking and unranking functions. We consider the functions for generating the partitions in lexicographic and reverse lexicographic order. In the presented pseudocode of the algorithms we use the function usePartition() to indicate further computations that use the currently generated partition (e.g., generating incedence matrix of a specific combinatorial object).
4.1. Recursive Algorithm
The most natural algorithms for generating integer partitions are recursive. Such algorithms provide a powerful and flexible framework for problem-solving, enabling elegant solutions to complex challenges by decomposing problems into smaller, more manageable sub-problems. Integer partitioning is an excellent example of the use of recursion. The recursive algorithm essentially breaks down the problem of partitioning a positive integer as a sum of natural numbers into smaller sub-problems, providing a systematic way to explore all possible partitions. Recursive algorithms for generation of integer partitions are given in [
6,
14].
We can present the integer n as sum of and k starting with . Afterwards, we can recursively apply the partitioning for both and k. In this case, the parts will be ordered in decreasing order and the partitions in reverse lexicographic order. In general, we have the following steps:
Base Case: If the input is 0, return an empty set, as there is no partition for it.
Recursive Step: For a given natural number n:
- –
Start with the largest possible value of k (which is n or the previous value of k).
- –
Recursively apply the same partitioning algorithm to the remaining sum .
- –
Continue decreasing the value of k until reaching 1.
Note: We need to be cautious and avoid generating repetitive partitions.
In each recursive iteration, in order to generate only distinct (and ordered) partitions, the starting value of
k is either
n or the value of
k from the previous recursive iteration. Using a variable
, that contains the value of
k from the previous recursive call of the function, also ensures avoiding generating duplicate partitions. The recursive steps apply the partition algorithm to the remaining sum
. This process continues, decreasing the value of
k down to 1. In Algorithm 1, we present pseudocode for generating all partitions in reverse lexicographic order:
| Algorithm 1 PARTNUM |
| Input: |
| n – integer to be partitioned in this step |
| – index in where the new summand will be placed |
| – array containing the current partition |
| – value of the part added in the previous recursive call |
| Output: |
| Generates all partitions of n recursively in non-increasing order. |
| Calls usePartition(part) for each generated partition and updates the global variable total. |
| 1: |
| 2: |
| 3: if then |
| 4: |
| 5: for downto 1 do |
| 6: |
| 7: if then |
| 8: call PARTNUM |
| 9: else |
| 10: total ← total |
| 11: call usePartition |
Example usage: To generate all partitions of
, initialize an array
part of length 80 and call:
After the call, total will contain the number of generated partitions, and each partition will have been processed by usePartition(part).
In the presented pseudocode, the function usePartition executes additional computations that use the generated partition. As can be seen, a full partition is generated and can be used only when the end of the recursion is reached. Thus, to generate a new partition with k elements we have k recursive calls. To generate all partitions we execute at most n recursive calls for each partition. Thus, in the worst case, we execute recursive calls. To generate new partition, however, we do not need to have the previous partition already generated. Thus, the algorithm can be easily modified to generate partitions with some restrictions. This is one of its main advantages when the partitions are used in the generation process of more complex combinatorial algorithms.
Alternatively, we can start with the smallest possible value for k, which is 1. In this case, we can generate the partitions in lexicographic order. The starting value for k in the recursive call here will be the largest of the values between 1 and the previous value for k.
4.2. Non-Recursive Algorithm
Recursive algorithms are more inefficient compared to non-recursive ones. They provide great flexibility and can be easily modified to introduce some restrictions, for example to generate all partitions with less than
k parts. In cases where fast generation of the partitions is more important than having an easily modifiable algorithm, a non-recursive approach to the generation is preferred. An algorithm that takes on average constant time to generate the next partition (CAT algorithm) is presented in [
17]. It generates
using sequence representation. It is shown that the given algorithm is significantly more effective than algorithms for generation with other representations.
Another approach, based on the algorithm in [
17], uses successor function for the generation. Using such functions, we can generate the partitions in both lexicographic and reverse lexicographic order. A successor function generates the next partition using the current one. Algorithms for calculating the successor of a given partition in
are presented in [
14] and are based on the following Lemmas:
Lemma 3. Let . If and for even n or for odd n, the successor of a is . If or , , then the successor of a in is . In the other cases, the successor of a iswhere , , . Proof. Obviously, holds only for the last partition , which has no successor, and therefore we exclude it. It is clear that for even n the only possible successor of in lexicographic order is , the same is true for odd n and .
If or , , but not the above case, then and therefore it is not possible to have an successor with k or more summands.
In the last case
and therefore if the successor is
, then
for
,
,
and
. The first partition of
in lexicographic order such that
, is
for
. To verify the last partition, we calculate the sum
Furthermore, to check whether the parts in b are given in ascending order, we calculate . □
Example 1. Let and . Then and . The successor of a in is . Let us continue with the successors in . The successor of is . Now and .
Lemma 4. Let . Then the successor of a in iswhere q is the position of the right-most part with value , , , , and . Proof. The last partition in this ordering is
and therefore we exclude it. Take
. If
is the right-most part with value
, the first
parts in
a and its successor
are the same, and after that
b contains as many parts
, as possible. Since
we have
. Moreover,
and so
, which gives us that
. Note that the part
x can be equal to 0. □
Example 2. Let and . Then , , and . The successor of a in is . We see here that , and .
Here, we will consider only the algorithm for generating
. As can be seen in Lemma 3, to generate the next partition we only consider the last two parts of the current partition. In the algorithm itself, we look at the values of
and
. We add 1 to the value of
and to retain the sum
n we subtract 1 from
. The pseudocode in Algorithm 2 gives the successor function that generates the next partition in the lexicographically ordered set. This function can be used to generate the full set
.
| Algorithm 2 SUCCESSOR |
| Input: |
| n – integer being partitioned |
| – array containing the current integer partition of n |
| k – index of the last summand in the array () |
| Output: |
| Modifies to the next partition in lexicographic order and calls usePartition(part). |
| 1: |
| 2: if or then |
| 3: return ▹ Invalid index |
| 4: |
| 5: |
| 6: |
| 7: while do |
| 8: |
| 9: |
| 10: |
| 11: if then |
| 12: return ▹ Invalid index |
| 13: |
| 14: call usePartition(part) |
4.3. Ranking and Unranking Algorithms
Other functions, that are typically discussed with algorithms for generation of basic combinatorial objects, are ranking and unranking functions. Such functions can be used for partitioning the set into close to equal subsets - a key part in the parallelization of algorithms. Here, we present two sets of ranking and unranking functions for and .
Using the recurrence relations, given in Lemma 1 and Lemma 2, we can calculate the rank of a given partition
by using
and
for
and
, respectively. A formula for computing the rank of partition in
is given in (
3). Analogously, a formula for computing the rank of partition in
is given in (
4). We write the partition with a leading zero for both orderings. This is easily avoided in the algorithms themselves by changing the execution order of some computations.
In practice, we use the number of parttitions succeeding the given partition. The pseudocode in Algorithm 3 give the ranking function for the lexicographic order
, where a 2-dimensional array
N contains the values for
:
| Algorithm 3 RANK_LEX |
| Input: |
| – array containing the current integer partition of n |
| n – integer being partitioned |
| k – number of summands in the current partition |
| – precomputed table: number of partitions of i whose smallest part is at least j |
| Output: |
| Returns the lexicographic rank of the given partition. |
| 1: |
| 2: ▹ Initialize to largest possible rank |
| 3: for to do |
| 4: |
| 5: |
| 6: |
| 7: return |
Analogously, we can use
that gives the number of partitions of
n with fixed first part
k in the unranking function for
. If the given rank is less than this value, then
k is the first part of the corresponding partition. Otherwise, we continue for the next possible value of
k and change the value of the rank correspondingly. The pseudocode in Algorithm 4 give the unranking function for the lexicographic order, where a 2-dimensional array
N contains the values for
:
| Algorithm 4 UNRANK_LEX |
| Input: |
| n – integer to be partitioned |
| – integer rank () |
| – precomputed table: number of partitions of i whose first part is at least j |
| Output: |
| The array contains the integer partition corresponding to the given rank. |
| The function returns the index of the last summand (i.e., length ). |
| 1: |
| 2: if or then |
| 3: return 0 ▹ Invalid rank |
| 4: |
| 5: |
| 6: while do |
| 7: for to n do |
| 8: ▹ Number of partitions of n with first part x |
| 9: if then |
| 10: |
| 11: |
| 12: |
| 13: |
| 14: break |
| 15: else |
| 16: |
| 17: return |
The unranking function for the reverse lexicographic order is analogous. We use a 2-dimensional array that contains the values of and we traverse k in decreasing order starting with n. When we add k to the current partition, the next possible value, however, is either k or the new value of n (which is ), depending on which is the smaller of the two. To parallelize an algorithm for generating all partitions, we can use the unranking function to set the first partition for subsets that will be generated in parallel.
4.4. Generating Partitions with Restrictions
There are cases where we need to generate the partitions of integer n with some restrictions. The most popular restrictions in the literature are generating the set of partitions of n into less than k parts and generating the partitions n having parts with values less than k. In practice, partitions in the set containing parts with values less than k are conjugated to partitions in the set, that contains partitions with less than k parts. Thus, if we generate one set, we can easily generate the other as their conjugate partitions. When working with the recursive algorithm, we can easily modify the algorithm to generate partitions containing parts less than k by setting the value of to k. Other limitations can also be easily added to the recursive algorithm. Such is not the case for the non-recursive algorithm since it generates the next partition based on the current one.
5. Parallel Implementation
In this section we present different approaches to parallelize known algorithms for generating integer partitions. The most significant stage of developing parallel implementation is the way the problem is divided into smaller problems that will be solved in parallel. Inefficient division can result in slower execution times. Another crucial moment in the implementation is choosing appropriate data types and minimizing data sharing. In OpenMP, data sharing is emulated by writing in global memory and synchronization. In the current work, we present two parallel strategies for the generation of integer partitions based on two different algorithms. We use OpenMP for the implementations, while simultaneously commenting on some of its advantages and challenges when developing parallel programs and algorithms.
5.1. Recursive Algorithm
Let us consider the parallel algorithm with a recursive approach for breaking down a number n into a sum of positive integers. For the implementation, we use the task construct. A task is only created at the first level (position) of the integer partition. The partitions are written in a structure in order to be shared in the recursive step between tasks. To evaluate the efficiency of the decomposition of the problem and to calculate the total number of partitions, we also use arrays for counting the number of tasks and the number of computed distinct partitions per task. The main steps in the parallelization of the algorithm are the following:
- 1.
Generate a set consisting of partitions of n with two elements.
- 2.
In parallel, for each element of , generate all partitions of the second element with the function partnum.
Visualization of one possible outcome of the algorithm for
, executed in single-threaded mode and with four threads simultaneously (with the master thread as 0), is shown in
Figure 2. The integer
can be partitioned in seven distinct ways. Based on everything discussed so far, it is evident that the first partition for
is 5. When the code runs and reaches the line 5, since
and
do not meet the conditions, the execution proceeds to ‘
else’. Further execution of the code recursively applies the same partitioning algorithm to
. This process continues, decreasing the value of
k until it reaches 1.
The arrays defined in this way lead to poor scaling. Since a one-dimensional array is used, independent data elements may share the same cache line, causing each data update to trigger a continuous shift of the cache line between threads. This is called
false Sharing, illustrated in
Figure 3.
We can use padded arrays as a solution to this issue, ensuring that the elements occupy distinct cache lines. This is achieved by defining two-dimensional arrays to count the number of tasks and the number of computed distinct partitions per task. Using padded arrays requires a deep understanding of the cache size and architecture. If the code is run on a machine with a different cache line size, performance can degrade significantly. This means that padded arrays are not a portable solution. The solution for addressing both false sharing and portability is to use the OpenMP threadprivate directive. This directive allows named common blocks and variables to be private to a thread while remaining global within that thread. In other words, threadprivate preserves global scope within each thread.
The directive #pragma omp threadprivate (list) must appear after the declaration of the listed variables or common blocks. Each thread receives its own copy of the variable or common block, ensuring that data written by one thread is not visible to other threads. The list is a comma-separated collection of file-scope, namespace-scope, or static block-scope variables that do not have incomplete types.
In this version, the necessary modifications are made after the inclusion of the libraries, as shown in Algorithm 5:
| Algorithm 5 PARTNUM_PARALLEL |
| Input: |
| n – integer to be partitioned |
| – index in where the new summand will be placed |
| – maximum allowed value for current summand |
| – array representing the current partition |
| Output: |
| Generates all partitions of n recursively in non-increasing order |
| Uses parallel tasks for the first level of recursion |
| Calls usePartition(part) for each generated partition |
| 1: |
| 2: |
| 3: for downto 1 do |
| 4: |
| 5: if then |
| 6: if then ▹ First level: create parallel tasks |
| 7: parallel task: |
| 8: Create a local copy of |
| 9: PARTNUM_PARALLEL |
| 10: else |
| 11: PARTNUM_PARALLEL |
| 12: else |
| 13: Increment thread-local counter NumPartT |
| 14: call usePartition |
Example usage: To generate all partitions of
n, initialize the counters
total and thread-local
NumPartT(Globally defined by
#pragma omp threadprivate (NumPartT)). Only the master thread is active in the parallel region and it calls:
After the parallel execution, accumulate the counts from all threads:
Each partition has been processed by usePartition(part).
The thread-specific copy of the variable NumPartT is used to count the number of computed distinct partitions per task. The counting of computed distinct partitions occurs in the partnum function, within the final else condition. The final step is to sum the count of computed distinct partitions that occur per thread. This summation is performed in the main function using the #pragma omp critical directive for synchronization, ensuring protected access to prevent race conditions. The use of the critical directive defines this version of the algorithm, referred to hereafter as Recursion Critical.
5.2. Parallelization Through Subsets
Recursive algorithms are typically less effective than non-recursive ones. Therefore, we also consider parallelization of the non-recursive algorithm for generating the set of all partitions in increasing order shown in [
14]. One approach for parallelization uses an unranking function for the set. The main idea is to partition the set
into
t equal subsets, where
t is the number of threads, and each thread would calculate
partitions. If
is not an integer, then each thread except one generates
and a single thread will generate
partitions. The algorithm has the following main steps:
- 1.
Calculate and a 2-dimensional array N, where contains the number of partitions of n with all parts greater than or equal to k;
- 2.
Calculate and the rank of the first partition (, where is the id of the current thread) for each thread;
- 3.
Generate the first partition using the unranking function;
- 4.
Generate next partition until partitions have been generated.
Let us now comment on each step in more detail. We use to partition the set of all partitions into subsets. Here, the 2-dimensional array N is used in the unranking function that generates the first partition in each subset. The functions to calculate and array N are executed once before the parallel region and take less than 5% of the execution time. The rest of the steps are executed in parallel by multiple threads. For the parallelization, we use a parallel for construction to divide the computational work among the threads. Each thread executes the computations for a single iteration of the for cycle.
The second step consists of calculating the number of partitions that each thread needs to generate. The presented division method distributes the work almost equally - only one thread generates more partitions. For large enough , the difference in the workload will be negligible. For example, and if we use 24 threads then each thread will generate 26 partitions and the last thread will generate 3 more partitions.
After we have the initial partitions using the unranking function, we generate the next partition [
14] until each thread has generated
partitions. A global variable is updated to calculate the total number of partitions using the
critical clause. This update is executed only once per thread and therefore has minimal impact on the execution time. The pseudocode in Algorithm 6 gives the parallel version.
| Algorithm 6 SUBSETS |
| Input: |
| n – integer to be partitioned |
| – number of parallel threads |
| – precomputed partition table |
| Output: |
| Generates all partitions of n in parallel |
| Calls usePartition(a) for each generated partition |
| 1: |
| 2: compute_partition_table |
| 3: total ▹ total number of partitions |
| 4: sum |
| 5: parallel for each thread to : |
| per_th |
| beg_local per_th |
| last ← total per_th |
| if , then per_th ← last |
| num_local |
| initialize local array |
| UNRANK |
| 6: while num_local < per_th do |
| 7: |
| 8: |
| 9: |
| 10: while do |
| 11: |
| 12: |
| 13: |
| 14: |
| 15: num_local ← num_local + 1 |
| 16: call usePartition |
| 17: critical section: |
| sum ← sum + num_local |
5.3. Computational Aspects
Let us consider some of the computational aspects of the presented parallel implementations. The integer partition algorithms were parallelized using OpenMP multi-threading in a shared-memory environment. For the recursive algorithm, the computation naturally forms a search tree. We applied task-level parallelism, assigning independent branches to different threads, enabling concurrent exploration of disjoint subproblems. For the iterative, non-recursive algorithm, parallelization was achieved over independent loop iterations, allowing multiple threads to process separate portions of the solution space simultaneously. Overall, parallelism is achieved through concurrent execution of tasks rather than through multi-process execution or traditional data-level parallelism. Due to the nature of the algorithms themselves, no GPU acceleration or SIMD vectorization was employed.
Both versions - using recursion and subsets - demonstrate strong scalability. For the implementation using recursion, we have improved the initial scalability issue, that occurs when counting the total number of partitions, by using the
threadprivate directive. In practice, when using the partitions in algorithms for constructing combinatorial objects we are not interested in the total number of partitions. Furthermore, in both variants individual threads generate a close to equal number of partitions, ensuring efficient load balancing and optimal resource utilization. In the recursive version this is possible through the use of tasks. The benefits of their use become more evident, since they provide great flexibility and ease of use. The problem is divided into varying numbers of tasks, either larger or smaller. The number of the tasks depends on the level of the recursion on which the parallelization occurs. While increasing the number of tasks leads to a more balanced work distribution, it also introduces overhead from creating additional tasks. Moreover, the sequential portion of the program grows, which reduces scalability. Therefore, a compromise must be made, taking the available hardware resources into account. In the implementation using subsets, the work distribution is generally satisfactory in the presented cases. However, it can be made even more uniform as follows: the remaining
m partitions assigned to the last thread can be redistributed among the last
m threads. The good scalability in both cases is also seen in the experimental results, given in
Section 6.
Another main aspect of parallel algorithms is the communication between computational units. The proposed implementations do not require communication between threads. The Recursion Critical algorithm explicitly illustrates how to prevent false sharing, thereby enhancing performance and independence of the computations. In the version with subsets, there is no communication. The proposed implementations also are not affected by some hardware specifications such as cache lines and NUMA nodes.
Lastly, we can consider the use of extended vector registers. Such optimization is not applicable here since additional data is not used in the generation process. Thus, they do not directly enhance the performance of the number partitioning algorithms that are described. However, extended registers can be used in additional computations that can be executed in the generation algorithms for different combinatorial objects.
7. Conclusions
In the current work, we consider algorithms for generating partitions of integer
n, modification that generate partitions with restrictions, and ranking and unranking functions for integer partitions in both lexicographic and reverse lexicographic order. We present two parallel implementations for generating partitions. Another concept for parallelization, presented in [
13], uses
n processors, where each process does not compute the full partition, but just a part of it. In the presented implementations, a thread generates a full partition that can then be used for additional computations. We compare their execution times to the sequential implementations. The experimental results show that a parallelization using the unranking function and subsets of
has better scalability compared to the parallelization of the recursive algorithm. The C/C++ code of the algorithms can be found in [
29] and the parallel functions in the
Appendix A.
On multi-node computer systems, a distributed memory interface like MPI, or a hybrid MPI+OpenMP setup, is required. The implementation of the non-recursive version will be much more natural. Additionally, the unranking function enables the generation of integer partitions using the CUDA interface for GPU accelerators. This approach is particularly efficient in scenarios where further computations with the generated partitions are not required, offering both speed and convenience. Research in this area remains as future work. We also present a new integer sequence that gives the number of partitions of n for which and .