1. Introduction
An orthogonal array
is a mathematical structure consisting of a
over the set
such that every
t-subset of columns contains exactly once every
t-tuple over
. Value
v is called the order or the alphabet of the orthogonal array, and value
t is called the strength of the orthogonal array. One closely related design is a covering array
, where every
t-subset of columns contains at least once every
t-tuple over
. The covering array number
is the minimum number of rows
N for which there exists a CA with
k columns, strength
t, and order
v:
When
v is a prime power and
, there exists a
[
1]; that is, with
rows it is possible to construct a CA where the number of columns is one unit more than the order. In this work, the focus is on CAs with
and
; then, the construction of CPHFs starts with a random CPHF with 8 columns (one unit more than
).
The motivation to do this research is twofold: theoretical and practical. On the theoretical side, the improvement of CANs helps to understand the asymptotic behavior of the number of rows needed to construct a predefined number of columns given an alphabet (
v) and a strength (
t). On the practical side, constructing improved CAs enhances their application in areas such as software and hardware testing [
2,
3]. In these contexts, the fewer rows a specific CA has, the less computing time–human resources are required to ensure a certain level of quality in software and hardware components. In addition to this motivation, the fact that the computational effort to construct CAs using the domain of CPHFs is less than the computational effort needed to construct CAs directly in the CA domain (this fact will become clearer later).
It is well known that it is a significant combinatorial challenge to find the exact value of
for general values of
t,
k, and
v; many results have been published on computing upper bounds of CAN values for
; see [
4,
5,
6,
7]. These studies remain relevant because, compared to the best-published results reported in [
8], the mentioned papers still present many current upper bounds. Naturally, it is to be expected that the current upper bounds for
can be further improved. The definition of new upper bounds on the size of CAs has been addressed in several ways (see [
9] and references therein). One active area of research is the construction of CAs with a maximum number of factors and a smaller number of rows (i.e., to improve the currently known upper bounds for CAN).
A CPHF, denoted by , has four parameters (n, t, k, and v), where n is the cardinality of the row set, k is the number of columns, v is the order or alphabet that must be a prime power, and t indicates the coverage strength of the CPHF; the covering properties of CPHFs will be explained later in this section.
CPHFs are very convenient representations of CAs for two main reasons: (1) a CPHF with n rows accounts for rows in its associated covering array; and (2) the coverage verification for a CPHF has a computational complexity proportional to , while the computational complexity of the coverage verification in a covering array has a computational complexity proportional to . To appreciate this advantage, assume a CPHF (3; 25, 7, 6), the corresponding CA will have 352,947 rows, i.e., a is obtained. Then, the computational effort to verify the CPHF will be on the order of 114,760,800, and conversely, the effort to verify the CA will be on the order of 62,506,913,700.
It was decided to construct improved upper bounds on the size of covering arrays for using a CPHF representation (since the next prime power greater than 5 is 7). To achieve this goal, a greedy column-extension algorithm that starts with a random CPHF with 8 columns was used.
CPHFs were first described by Sherwood et al. [
10]; and extensions of that work were given in [
6,
11,
12,
13,
14] and references therein. The elements of CPHFs are
t-tuples over
, denoting the finite field with
v elements. Let
be the base
v representation of
, that is,
; and let
be an element of a
. This element generates a vector of size
, where for
the symbol in position
i is
. In other words, the vector of size
is generated by computing the inner product of
with the base
v representation of each
.
A
t-tuple of CPHF elements generates an array of size
by placing as column vectors the vectors of length
that are generated by the
t CPHF elements. If this array is an
, then the
t-tuple of CPHF elements is deemed
covering; otherwise, the
t-tuple is deemed
non-covering [
10]. In a
, every
t-subset of columns contains at least one covering tuple of CPHF elements.
A
produces a
when the CPHF elements are replaced by their corresponding column vectors of length
. However, the resulting covering array has a certain number of repeated rows, which can be eliminated to reduce the size of the covering array. The number of repeated rows in the corresponding CA depends on the repetition of values in the columns of the CPHF. See, for example, the CPHF(
) shown in
Table 1 (matching elements in each column are colored in cyan), the corresponding CA with
rows,
, is shown in
Table 2; repeated rows are colored in cyan (this means that after the removal of the repeated rows, the CA will be
.
The advantage of the CPHF representation is that it allows for efficient verification of the covering conditions. The covering test is as follows:
a t-tuple of CPHF elements is a covering tuple if and only if the matrix (whose rows are the t CPHF elements) is nonsingular [11]. Then, to verify whether a given array over
is a
, it is necessary to validate that each
subarray of
t columns contains at least one covering tuple. At most, it is necessary to solve
linear systems of size
, each of which takes
time using Gaussian elimination.
The structure of the document is as follows:
Section 2 reviews the randomized extension algorithm of [
11] to add new columns to an initial CPHF;
Section 3 describes our greedy method of adding new columns to a given CPHF;
Section 4 presents the most important results that were obtained using the new greedy column-extension method; and
Section 5 presents the conclusions.
2. Related Works
Colbourn et al. [
11] introduced a random extension algorithm to increase the number of columns of an initial CPHF with
columns. The initial
is created by a randomized algorithm. At the beginning of the algorithm, an array of size
(remember that
n is the number of rows of the CPHF and
is the initial number of columns) is filled uniformly, at random, with elements from the set
; this array might not be a CPHF because it is possible that some subarrays of
t columns do not have a covering tuple (remember that
t defines the strength of the CPHF). From these subarrays, one column is selected at random and its
n entries are again assigned with random elements from
; this process of resampling columns is repeated until the array becomes a
.
Now that the initial has been generated, the task is to add as many columns as possible to form a with columns (i.e., try to increase the number of columns). The extension algorithm generates random columns until it finds one that makes a CPHF with the ongoing solution, which at first is . Thus, the first new solution is , and becomes the new ongoing solution. The process is repeated in an attempt to construct a . The number of columns to add to the initial CPHF is not predefined, but the algorithm performs a predefined number of iterations, where an iteration consists of generating a random column and checking if it can be appended to the current solution.
However, the algorithm has an interesting strategy for cases in which the random column does not form a CPHF with the ongoing solution, . Let be the horizontal concatenation of arrays X and Y, and let C be a random column. For each -set B of columns from A, the extension algorithm determines whether the array contains at least one covering tuple; if not, the algorithm performs the following two actions:
Increment by one unit the value of a variable numuncov that is used to store the number of uncovered subarrays, that is, the number of subarrays that have no covering tuple as a row.
Increment the value of uncov[i] by one unit for each column . This array uncov stores the number of times that column i of the ongoing solution participates in an uncovered subarray.
If max(uncov[i]: ) = numuncov, then the random column C replaces one of the columns i such that uncov[i] = numuncov. Thus, the ongoing solution, , is not extended, but is modified because one of its columns was replaced by a randomly generated column.
This strategy attempts to eliminate from the current solution the columns that do not allow the addition of new columns. A large number of CAs were refined by this random extension algorithm.
Colbourn and Lanus [
15] used the random extension algorithm from [
11] to construct
restricted CPHFs. Let
, and denote by
the
l-th entry of the element at column
j and row
i of
A. A restriction
S of the replication
r and dimension
m is given by an
r-tuple
of distinct elements from
and by an
r-tuple
, where each
is a
m-tuple of distinct elements from
. Denote by
the
j-th element of
.
A satisfies
S if when
is
for all
and
. A CPHF can satisfy one or more restrictions.
Wagner et al. [
16] recently proposed an extension of the IPO algorithm called CPHIPO, which works directly in the CPHF domain and not in the CA domain, and many new CANs were reported.
Other related algorithms are those that expand a given covering array by adding more columns and possibly more rows. However, in the case of CAs, the columns to be added are not generated randomly, they are generated through a process that tries to minimize the number of
t-tuples over
that are not covered in the new subarrays of
t columns created by adding the new column. Examples of these algorithms are the IPO algorithm [
17], the coverage inheritance algorithm [
18], and the two-stage algorithm of [
5].
3. Greedy Column Extension Algorithm
The greedy algorithm of this work adds columns to a given with columns. For each new column added, a new array is obtained. Then, the results of the algorithm are s arrays over with the respective number of columns , , …, ; each of these arrays may or may not be a complete , where .
During the execution of the algorithm, the number of columns of the ongoing solution increases and decreases so that several solutions can be generated with the same . At any time, bestGlobalSolutions[j] contains the best solution found with columns, where ; in particular, bestGlobalSolutions[0] contains the initial with columns. The best solution found for a certain number of columns k is the array of size over , which has the smallest number of uncovered subarrays among all arrays with k columns that have been constructed by the algorithm.
To add a new column to the ongoing solution B with k columns, the algorithm generates numCandidates random columns, and the one that produces the smallest number of uncovered subarrays is added to B, which now has columns. Each time a solution with columns is constructed from a solution with k columns, the algorithm checks whether the new ongoing solution with columns improves the best global solution with columns, and in the positive case, the algorithm stores the new solution in bestGlobalSolutions[].
After the addition of a new column, the algorithm deletes from the new ongoing solution B up to columns ( is the maximum number of columns that are deleted when trying to improve results), where . At each step, the algorithm deletes from the current solution the column involved in more uncovered subarrays. Suppose that the ongoing solution B has columns, where is the number of columns of the initial CPHF; then, the number of columns to be removed from B is min(, ). The p columns are removed from B one at a time, producing p arrays with the respective numbers of columns . The objective of producing these p arrays is to refine the best global solution with columns. Each array with columns that improves its corresponding best global solution becomes the new best global solution with columns, and so it is stored in bestGlobalSolutions[].
After deleting the p columns from the ongoing solution with k columns, there are two possible cases:
None of the best global solutions with were refined.
At least one of the best global solutions with was refined.
In the first case, the algorithm continues its normal work, and it generates numCandidates random columns again to select the one that will be added to the ongoing solution.
The second case is more interesting. Suppose that the smallest refined best global solution was the one with columns (. Then, this array becomes the new ongoing solution B, and the process of adding one more column to the ongoing solution is restarted from this new ongoing solution.
Column removal is what makes this new algorithm special. It seems that removing columns takes us away from our desired result, but in reality, it is what allows us to build high-quality CPHFs. It is relevant to highlight that the algorithm does not remove just one column from the ongoing solution, it can remove several columns in a single iteration of the outer while loop.
The algorithm ends when the maximum number of columns to add is reached and none of the p arrays with the number of columns improves the corresponding global solution.
Algorithm 1 presents the pseudocode of the greedy algorithm to add columns to a given
. In the algorithm,
k is not the number of columns of the ongoing solution, but the number of new columns that have been added to the initial CPHF with
columns; so the ongoing solution has
columns in the algorithm.
Algorithm 1: greedyExtension(, s, numCandidates, ) |
1 bestGlobalSolutions ← vector();
|
2 bestGlobalSolutions[0] ; |
3 for to s do |
![Mathematics 12 01908 i001]() |
28 writeResults(bestGlobalSolutions); |
At the beginning of the while loop, the ongoing solution B is initialized with the best global solution with columns; so each iteration of the loop always takes the best solution found so far for its number of columns, regardless of what happened in the previous iteration. This is the greedy part of the algorithm because the algorithm does not take as the ongoing solution the one generated in the previous iteration. In some cases, the best ongoing solution with k columns may be the solution generated in the previous iteration, but it may also be a solution generated several iterations ago.
There are two places in the algorithm where a solution with columns can be refined. The first place is at line 17 when a solution with k columns is obtained by adding a new column to the previous solution with columns. The second place is at line 24 when a solution with k columns is obtained by deleting columns from a solution with more than k columns. For the solution with columns, the only place where it can be refined is at line 17.
Note that the algorithm is designed in such a way that the initial array may or may not be a complete CPHF, i.e., the input array can have some uncovered subarrays, and the algorithm may construct a complete CPHF. That is the reason why the input array A is placed in position 0 of bestGlobalSolutions. In particular, a random CPHF with 8 columns as the initial solution was used to see the real power of the greedy extension algorithm. However, it is worth mentioning that it is possible to start with a previously constructed CPHF with any number of columns (even if the CPHF has some uncovered combinations).
Algorithm 1 uses the following helper functions:
vector(): creates a vector of size .
randomColumn(: generates a column vector of size n, where each element is from the set .
concatenateArrays(): creates and returns an array that contains the columns of B followed by the columns of C; the input arrays are not modified.
uncoveredSubarrays(X): returns the number of uncovered subarrays in the input array X.
removeWorstColumn(E): removes from the array E the column that appears more times in the uncovered subarray. The input array has one column less after the function call. If there is more than one column with the highest number of occurrences in uncovered subarrays, then one of those columns is selected randomly.
writeResults(bestGlobalSolutions): writes the best global solution for each array with columns.
The functions uncoveredSubarrays() and removeWorstColumn() are the more complex and require the most execution time. Suppose that the input array for uncoveredSubarrays() has k columns; then, the function needs to check all subarrays of t columns to determine how many are uncovered. In the removeWorstColumn() function, the situation is similar because the function needs to find which of the subarrays are uncovered in order to determine the column involved in more uncovered subarrays.
If k and/or t are large, then the number of subarrays to be processed can be huge, and so the functions uncoveredSubarrays() and removeWorstColumn() could take a long time to run. This is an important factor to consider because uncoveredSubarrays() is invoked several times, and removeWorstColumn() is invoked p times in every iteration of the main while loop.
For these reasons, a small implementation refinement in the above algorithm was made. Together with each best global solution, the list of the subarrays that are uncovered in the solution was stored. On line 12, verification of the new subarrays of t columns that are created by the addition of the new column C to the ongoing solution B is performed. Let L be the list of the new subarrays that do not have a covering tuple as a row. Then, the list of uncovered subarrays for D is obtained by concatenating the list of uncovered subarrays for B with the list L. On line 23, the advantage of the list of uncovered subarrays to efficiently find the worst column in an array, was used.
4. Computational Results
Before presenting the main computational results, one of the cases where the greedy extension algorithm was able to achieve a new best result will be described. The initial CPHF is the non-restricted CPHF(
), which was built with the algorithm in [
19]. The Algorithm 1 was executed with the following parameters:
CPHF(
),
,
numCandidates= 15,000, and
.
Table 3 shows the evolution of the number of uncovered subarrays for each number of columns from 33 to 45. The first column indicates the current number of columns; it can be seen how this value increases and decreases as the algorithm executes. In each row, the new best global result achieved for a column is enclosed in a box. It can be seen that the algorithm does not always obtain a new best global result for the column immediately before or after the last refined one. In this example, our algorithm can construct a CPHF(
), which is one of the results reported in Table 5. After this CPHF was built, the algorithm continued its execution, but it was not able to make zero the number of uncovered subarrays for a CPHF with
columns. For this reason, the rest of the iterations are not shown.
In the following, a description of the main computational results obtained with Algorithm 1 is presented. In most cases, the initial solution is a random CPHF with eight columns. The algorithm can construct both normal and constrained CPHFs.
Since the candidate columns are randomly generated, the algorithm is not deterministic. Moreover, it is possible to make several runs with an initial randomly generated CPHF, but with different values for the other three parameters. In the computational experimentation performed to obtain the results in this paper, the values for the parameters are as follows: numCandidates = 15,000, , and value of s varies between 10 and 2000 depending on the cardinality of the row set of the initial CPHF; the higher the cardinality of the row set, the higher the value of s (since our algorithm accepts any input CPHF, the result of one run can be used to make another run to add more columns).
As mentioned above, the focus was only on constructing CPHFs of order
and strengths from 3 to 6 (since excellent upper bounds have previously been reported for
, the next prime power is 7; but it is possible to run the algorithm with any alphabet that is a prime power). As shown in
Table 4,
Table 5,
Table 6 and
Table 7, a large number of new results were achieved; it seems that this is good proof of the effectiveness of the proposed algorithm. The results are compared against the ones reported in [
8].
Table 4 shows the important results for CPHFs of strength 3. The constructed CPHF is shown in the first column of the table, and the coverage array that is generated from the CPHF is given in the second column of the table. The CPHFs in the first column can be restricted or non-restricted. When there is more than one CPHF for the same cardinality of the set of rows, the ones with fewer columns are the restricted ones. As mentioned above, a CPHF can satisfy one or more restrictions, so that CPHFs with the same row-set cardinalities produce CAs with different row-set cardinalities. The third column of the table indicates the row-set cardinality of the previously known CA with the same number of columns, order, and strength as the covering array in the second column; thus, for example, the value in the third column of the first row indicates that the previously known covering array with
columns, order
, and strength
had
rows, but the CA generated from our CPHF has 973 rows.
The last column of the table shows how many CA numbers (CAN) were refined by the new result in the second column. The covering array number for given values of
t,
k, and
v, denoted by
, is the minimum cardinality of the set of rows
N for which there exists a covering array with precisely
N rows, strength
t,
k columns, and order
v. The minimum
N for given values of
t,
k, and
v is unknown in general. In the covering array number notation, the information in the first row of
Table 4 indicates that the previously known result was
, and that the new result is
. Our result improves the upper bound of
from 1014 to 973. Value 7 in the last column of the first row indicates that
improves the known upper bound of the seven covering array numbers
. In general, in any row of the table, let
a and
b, respectively, be the cardinality of the set of rows and columns of the covering array in the second column of the table, and let
c be the value in the fourth column; then
a is the new bound for the
c covering array numbers
. The previous best bounds of the coverage array numbers were obtained from the covering array tables [
8].
Table 4 reports 19 new CPHFs whose derived covering arrays refined 903 covering array numbers.
Similar to
Table 4,
Table 5 displays the most important results for
. In this case, the table contains 59 CPHFs, and their derived CAs refined 8910 best sizes of CANs. This is remarkable since the covering array tables contain results of CAs up to 10,000 columns, and our algorithm improved 8910 CANs. That is, our algorithm improved most of the known results for
and
.
Table 6 shows the computational results for strength five. The number of new CPHFs is 39, and the number of new upper bounds of CANs is 1957. The latter number is considerably smaller than the respective numbers for strength
; the reason is the time constraints for processing CPHFs with
columns. However, it was a good achievement to construct a CPHF with 1999 columns, considering that it has
264,672,325,837,899 submatrices with
columns. The obtained results suggest that our algorithm can refine the best-known sizes of CANs with
, but it was not possible to extend the computational experimentation beyond 2000 columns for
.
The main results for the strength
are shown in
Table 7. The table contains 35 new CPHFs that are responsible for the improvement of 786 best CAN upper bounds. In this case, it was possible to process CPHFs with a maximum of
columns. As for strength
, the results indicate that our algorithm could improve the current best-covering arrays with
columns, but more computer resources will be needed.
In total,
Table 4,
Table 5,
Table 6 and
Table 7 contain 152 new CPHFs, which allow the improvement of 12,556 best sizes of covering array numbers. All the needed algorithms to obtain the results were coded in C language, ran under a UNIX environment, and compiled using GCC 11.4.0.
5. Conclusions
The main research question to be answered in this investigation concerns the improvement of upper bounds on the size of covering arrays of alphabet 7. This question was answered positively, given the definition of 12,556 new upper bounds of covering array numbers. The two reasons for focusing on order 7 were as follows: (a) the representation used is a covering perfect hash family that is defined only for prime power alphabets; (b) excellent upper bounds have been reported for alphabets 2, 3, 4, and 5, ergo, the next prime power alphabet is 7. The algorithm used to improve the upper bounds for covering arrays was a greedy column-extension algorithm that uses a column addition/removal process and starts with a random CPHF with eight columns. Our proposal allowed the construction of 152 new CPHFs of strengths 3, 4, 5, and 6. For strength three, 19 new CPHFs were built, and the derived CAs improved the upper bound of 903 CANs; for strength four, 59 new CPHFs were constructed and a total of 8910 new CANs were obtained; for strength five, the number of new CPHFs was 39 and the number of improved bounds of CANs was 1957; and finally, for strength six, the number of new CPHFs was 35 and the number of improved bounds of CANs was 786. In total, 152 new CPHFs were built, allowing the improvement of 12,556 upper bounds of CANs.