Abstract
This paper considers two-level regular fractional factorial designs for baseline parameterization. Some new results that reveal relationships between the K-values and word length pattern are developed. The new results help find two-level regular fractional factorial designs that are likely to be optimal under the K-aberration criterion. Illustrative examples are included to demonstrate this point.
1. Introduction
The application of experimental design in information theory is primarily reflected in optimizing experimental schemes to maximize information acquisition efficiency while minimizing uncertainty. The core concepts of information theory, such as entropy, provide quantitative tools for experimental design. For example, in the field of communications, experimental design can optimize channel coding schemes, thereby improving transmission reliability. Additionally, in machine learning feature selection, experimental design based on information entropy can identify the most discriminative feature subsets, reducing model complexity and enhancing classification accuracy. The integration of experimental design and information theory enables a systematic approach to solving key challenges in information acquisition, processing, and optimization, offering theoretical guidance for engineering and scientific experiments. Bose explored the interplay between statistical experimental design and information-theoretic concepts, discussing how to construct designs that maximize informational yield from data [1]. His work established parallels between statistical efficiency (particularly variance minimization in parameter estimates) and information-theoretic principles.
Factorial designs have a wide application in various fields. The baseline parameterization (BP) model and the orthogonal parameterization (OP) model are two of the most used models in the analysis of experimental data. The BP model is a linear model based on baseline constraints while the OP model is based on zero-sum constraints. There are numerous research findings under the OP model. For details, please refer to [2,3] and the references therein. BP is quite a natural option for modeling the experiments with each factor having a null state or baseline level. For example, in toxicology experiments [4], each binary factor represents the presence or absence of a particular toxin. Scientists consider the absence of all toxins to be the natural reference for the possible presence of some toxins. In such experiments, the status of the absence of a toxin can be naturally regarded as a baseline level. Another practical example is introduced by Glonek and Solomon [5] in a leukemic mice experiment. The authors presented two binary components that stand for sample and time, respectively. A natural baseline level is the condition of the non-leukemic line for one factor and time zero for another factor. For a broader interpretation of BP, one is referred to [6,7].
In recent years, choosing efficient designs under BP has raised considerable attention from researchers. The D-optimality and A-optimality are two commonly used efficiency criteria for selecting optimal experimental designs under the main effect model. From an information perspective, they aim to maximize the information content of the experimental data or minimize the variance of parameter estimates. Mukerjee and Tang [7] highlighted the A-optimality of two-level orthogonal arrays for BP when the interaction effects are all absent. Mukerjee and Huda [8] applied approximate theory together with discretization procedures to find designs that have high A-efficiencies and are robust to model misspecification. Liu et al. [9] employed the D-optimality criterion to find designs under BP that are both efficient and robust.
For situations where the interaction effects are present, the experimenter usually concerns the bias caused by interaction effects in estimating the main effects. Karunanayaka and Tang [10] proposed to add runs to one-factor-at-a-time designs for generating compromise designs that are competitive under both the efficiency criterion and the bias criterion. Mukerjee and Tang [7] proposed an optimality criterion, the K-aberration criterion, which quantifies the bias for two-level designs. Under the K-aberration criterion, a few works on designs for BP emerged. Li et al. [11] proposed an efficient incomplete search algorithm for finding nearly optimal designs and tabulated some 20-run (nearly) optimal two-level designs. Miller and Tang [12] focused on identifying efficient two-level regular designs through bridging the K-values and word length pattern. Mukerjee and Tang [13] developed certain rank conditions which, in conjunction with the idea of the minimum moment aberration and recursive set, help alleviate the burden of finding optimal two-level regular designs. Lin and Yang [14] considered finding multistratum designs for BP by using the coordinate-exchange algorithm. Li et al. [15] further proposed a theoretical construction method of compromise designs. Chen et al. [16] considered the situations where some two-factor interactions are also of interest in addition to the main effects and proposed an algorithm for searching optimal designs under their proposed minimum aberration criterion. Sun and Tang [17] investigated the linear relationship between the effects under BP and those under OP and explored its applications to design construction under BP in terms of estimability, optimality, and robustness. Yan and Zhao [18,19] put forward a minimum aberration criterion for three-level designs for BP and found some optimal designs using their proposed construction algorithm.
This paper focuses on two-level regular designs under the K-aberration. Though the two-level nonregular designs may outperform the regular ones in some cases, there is a compelling reason for considering two-level regular designs as has been justified in [13]: the results on two-level regular designs serve as a benchmark for evaluating further work on nonregular ones that have to be compared. Given the importance of the two-level regular designs for BP, we endeavor to make further progress on bridging the K-values and word length pattern and analytically calculate the quantities for two-level regular designs with a higher resolution based on the results in [13]. Such new progress has the following advantages: (i) it can help screen out the candidate designs that are not likely to be K-aberration optimal and thus can yield a further simplification of the search algorithm in [13]; (ii) it is capable of finding two-level regular designs that have better or even optimal K-aberration characteristics than those identified by the results in [13]. Illustrative examples are given to demonstrate these points.
The rest of this paper is organized as follows. Section 2 introduces some necessary notation and elementary knowledge of BP and the word length pattern. Section 3 presents the main results of this paper. Applications of the theoretical results are included in Section 4. The concluding remarks are given in Section 5.
2. Preliminaries
Consider an experiment with n factors each at two levels. A full design includes runs that correspond to level combinations of the n factors. Suppose only a fraction of the full design can be carried out for economic reasons. This paper considers the regular fraction of the full design. Let denote a two-level regular fractional factorial design D with runs and n columns, with each column at two levels 0 and 1; such a design can be obtained as follows: Let . Define the matrix
with columns arranged in Yates order, where
are q independent columns, and the other columns are obtained by taking the component-wise sum (modulo 2) of the independent ones, and say . A regular design D can be obtained by selecting n columns of such that q are independent and the other columns are component-wise sums (modulo 2) of the q independent ones. Clearly, the design D is an submatrix of .
For a regular design D, we regard it as a set with elements being the n columns of D and denote still the set as D without causing confusion. Let denote a subset of s columns of D and denote the collection of all the possible , where . For ease of presentation, is sometimes used instead of without causing confusion. Then for any , is also an submatrix of D. Note that each row of is an s-tuple with entries of 0 or 1; we call it a binary s-tuple. Let be the number of times that the s-tuple occurs in the submatrix . For a regular design D, let denote the total bias to the main effects estimations caused by all the s-factor interaction effects. Mukerjee and Tang [7] proved that
where
and denotes the collection of all the possible for given . Formula (2) applies to any two-level designs not just the regular ones. For more details of the K-aberration, one is referred to [7]. A two-level orthogonal array is K-aberration optimal if it sequentially minimizes the following sequence:
among all the two-level designs, given the run size and the number of columns.
The word length pattern (WLP) is a concept proposed for the designs under OP. With a slight modification, such a concept can also be applied to the two-level regular designs under BP as follows: For the original definition of WLP under OP, one is referred to [20]. For any regular baseline design D, denote as the sum of the components of the vector (modulo 2), where . Define
If the k columns in satisfy (modulo 2) or , then we have or 0, which leads to and the k columns correspond to a defining word, where and denote the N-vectors with all components being 1 or 0, respectively. If the k columns in do not correspond to a defining word, then these columns must be independent of each other. This means that all of the binary k-tuples appear equally often as rows in , which leads to . Let , then is the number of defining words of length k of the regular design D. For a given two-level regular design, we call the corresponding sequence
as its word length pattern and t as its resolution, if the first nonzero element in Sequence (5) is , where .
3. Main Results
Note that both Sequences (4) and (5) are closely related to the collection of a regular design. Recalling the definition of in (3), when an s-column collection contains no defining word, all of the binary s-tuples appear exactly times in , which means that such an contributes to . When an s-column collection contains some defining words, i.e., the columns in form some defining words, the analyses for the contribution to caused by such an become complex. Similar analyses are also required when considering . Therefore, it is necessary to investigate the possible cases of a given collection containing defining words, so as to calculate . Suppose a regular design has a resolution of t. Lemma 1 presents the maximum number of defining words in each of its -column collections .
Lemma 1.
Suppose D is a regular design of resolution t and . Then
- (i)
- contains at most two independent defining words for ;
- (ii)
- contains at most one defining word for ,
where .
Proof.
(i) For , denote . If contains three independent defining words , , and , then , , and generate another four defining words , , , and . These seven defining words contain at least 21 letters (columns) since each defining word contains at least letters. Note that a letter appears at most four times among the seven defining words. Then the seven defining words contain no more than 20 letters since there are only five columns in . This contradiction shows the validity of (i) for . For , the proof is similar.
(ii) If contains two defining words and with the length and , respectively, then and have at least common letters. Since and , the length of the defining word is at most , which contradicts . This completes the proof of (ii). □
Before proceeding to the main results of this section, we first introduce a lemma that is a refinement of the results from [12,21].
Lemma 2.
Denote as j columns from a regular design D of resolution t. Suppose, among these j columns, only the first i ones correspond a defining word, i.e., or . Then, we have the following:
- (i)
- The rows in the i-column matrix must consist of copies of a half replicate of the full factorial design;
- (ii)
- Furthermore, in matrix , for the copies of each distinct row of matrix , all the distinct rows of matrix appear equally times,
where .
In (i) of Lemma 2, the half replicate of the full factorial design that the i-column matrix contains depends on whether or (modulo 2). This is addressed in detail in Remark 1.
Remark 1.
In Lemma 2, if (modulo 2), then all the possible i-tuples that contain an even number of ones appear times in the i-column matrix . If (modulo 2), then all the possible i-tuples that contain an odd number of ones appear times in the i-column matrix .
For a defining word W, let denote the vector generated by taking component-wise sums (modulo 2) of the columns in the defining word. Then, or . Denote and as the numbers of length i defining words with and , respectively, where . Clearly, Denote as the number of pairs of length four defining words that have two common columns and , and as the number of pairs of length four defining words that have two common columns and . Theorem 1 builds the bridge between and the WLP for .
Theorem 1.
Suppose D is a regular design with resolution , then
Proof.
We calculate and , separately. For , suppose . Since D has resolution 4, contains at most one defining word. There are five scenarios as follows:
- (a1)
- contains one length-four defining word W with ;
- (a2)
- contains one length-four defining word W with ;
- (a3)
- contains one length-five defining word W with ;
- (a4)
- contains one length-five defining word W with ;
- (a5)
- The five columns in are independent of each other.
For (a1), suppose the defining word is (modulo 2) without loss of generality. According to Lemma 2 (i) and Remark 1, the rows that consist of four ones appear times in the four-column matrix . Therefore, the columns of entire ones appear times in the five-column matrix , i.e., .
For (a2), suppose the defining word is (modulo 2) without loss of generality. According to Lemma 2 (i) and Remark 1, the four-column matrix contains only rows that consist of an odd number of ones. This implies that none of the rows in matrix contains entire ones, i.e., .
With similar arguments to (a1) and (a2), we can obtain and for (a3) and (a4), respectively. It is obvious that for (a5). The number of s belonging to (a1)–(a5) are , , , , and , respectively. With the analysis above, it yields that
Now, we consider calculating . According to Lemma 1 (i) for , there are nine possibilities for :
- (b1)
- contains two independent defining words and , which generate the third defining word . Each of the three defining words has a length of four, one has and the other two have ;
- (b2)
- contains two independent defining words and , which generate the third defining word . Each of the three defining words has a length of four and ;
- (b3)
- contains only one defining word with a length of four and ;
- (b4)
- contains only one defining word with a length of four and ;
- (b5)
- contains only one defining word with a length of five and ;
- (b6)
- contains only one defining word with a length of five and ;
- (b7)
- contains only one defining word with a length of six and ;
- (b8)
- contains only one defining word with a length of six and ;
- (b9)
- The six columns in are independent of each other.
For (b1)–(b9), denote as any five-column subset of , i.e., . Now we proceed to investigate the values of and the number of s in each of the cases for (b1)–(b9).
For (b1), since there is a length-four defining word, say , with in , none of the rows in the matrix consisting of the columns involved in contains entire ones and thus . With careful checking, each must contain only one defining word, say W, which has a length of four with either or . For the s of the former case, we have and there are two such s in . For the s of the latter case, we have and there are four such s in . Note that the in (b1) contains a pair of length-four defining words that have two columns in common and their . Recalling the meaning of , we conclude that the number of s belonging to (b1) is .
For (b2), since contains three length-four defining words and each of which has , we have according to Lemma 2 (ii). Note that each must contain one of these three defining words. From Lemma 2 (ii), we have for each and there are six such s in . Note that there are three pairs of length-four defining words in and each pair has two columns in common. Thus, it has totally s in (b2).
For (b3), it is easy to obtain that . Each contains either a length-four defining word with or five independent columns. For the s of the former case, we have and there are two such s in . For the s of the latter case, we have and there are four such s in . Now we investigate the number of s that belong to (b3). The four columns of each of the defining words jointed with any two of the remaining columns of D induce an . Notably, the three defining words in each of case (b2) induce exactly the itself. Similarly, the in case (b1) can be induced by the length-four defining word with in it. Therefore, the number of s in case (b3) is .
For (b4), we have as there is a length-four defining word with . With a similar analysis to (b3), among the six s in , two of them have and four of them have , depending on whether the contains a length-four defining word with or not. The four columns of each of the defining words jointed with any two of the remaining columns of D induce an in . One thing to note is that the two length-four defining words with in each of case (b1) induce exactly the itself. The number of s belonging to (b4) is .
For cases (b5)–(b8), the results on the values of s, the number of s belonging to each case, the values of s, and the number of s in each , which have the same values of s, are straightforward. These results are summarized in Table 1 along with those for cases (b1)–(b4), where the notation represents the number of s of each value. Note that each in (b9) has and for each ; this results in . Therefore, there is no need to consider case (b9) when calculating .
Table 1.
and for calculating in Theorem 1.
Theorem 2 below builds the relationship between and the WLP for .
Theorem 2.
Suppose D is a regular design with resolution , then
for an odd , and
for an even .
Proof.
To calculate in (2) for , consider five possibilities for :
- (c1)
- contains only one defining word with length t and ;
- (c2)
- contains only one defining word with length t and ;
- (c3)
- contains only one defining word with length and ;
- (c4)
- contains only one defining word with length and ;
- (c5)
- consists of columns that are independent of each other.
With similar analyses to Theorem 1, we have Table 2 and Table 3 for calculating for an odd and even , respectively. With Table 2 and Table 3, we obtain that
for an odd , and
for an even .
Table 2.
for calculating for an odd in Theorem 2.
Table 3.
for calculating for an even in Theorem 2.
Considering , there are seven possibilities for :
- (d1)
- contains only one defining word and its length is t with ;
- (d2)
- contains only one defining word and its length is t with ;
- (d3)
- contains only one defining word and its length is with ;
- (d4)
- contains only one defining word and its length is with ;
- (d5)
- contains only one defining word and its length is with ;
- (d6)
- contains only one defining word and its length is with ;
- (d7)
- consists of columns that are independent of each other.
With similar analyses to Theorem 1, we have Table 4 and Table 5 for calculating for an odd and even , respectively. Note that each in (d7) has and for each ; this results in . Therefore, there is no need to consider case (d7) when calculating .
Table 4.
and for calculating for an odd in Theorem 2.
Table 5.
and for calculating for an even in Theorem 2.
Remark 2.
Theorems 1 and 2 establish relationships between the K-aberration and WLP that are further developments based on the work in [12]. Theorems 1 and 2 help narrow down the choice of finding optimal regular designs. Moreover, for some situations, Theorems 1 and 2 are capable of identifying the optimal ones. This point will be demonstrated in Section 4.
4. Applications
It is worth noting that the concept of isomorphism for the designs under BP is different from that under OP. Under OP, two designs are called isomorphic if one can be obtained from the other by column-permuting, row-permuting, or symbol-switching. However, the symbols of the two-level designs are not interchangeable under BP. Hence, two designs are called isomorphic under BP if one can be obtained from the other by column-permuting or row-permuting. Hereafter, we use the terms OP regular designs versus BP regular designs as discriminations. Clearly, switching symbols of some columns of OP regular designs may result in nonisomorphic BP designs. In the following, we illustrate how to find BP regular designs that have desirable K-aberration characteristics by using the catalogs of nonisomorphic OP regular designs displayed in [22].
Consider finding desirable BP regular designs under K-aberration. By checking the catalogs of nonisomorphic OP designs displayed in [22], all the OP regular designs have a resolution of either or 4. According to Theorem 1 in [12], any OP regular design with a resolution of has a smaller than those with a resolution of , noting that for and for . Among the OP regular designs of resolution , the designs with the minimum have a smaller according to Theorem 1 (b) in [12]. According to [22], the unique OP regular design with the minimum , denoted as , is determined by the following ten independent defining words: , , , , , , , , , and , where are the 1st, 2nd, th, th columns of the matrix in (1) with . The s of these ten defining words equaling to or determines BP regular designs that may have different K-aberration performances. According to Theorem 2 of [12], among the BP designs, those with the minimum have a smaller . For example, the following two BP regular designs and have the minimum , which results in the minimum :
Although and have the same value of , they can be discriminated with respect to by applying Theorem 1. Compared to , has the same and but a smaller than . This means that has a smaller than according to Theorem 1. As a confirmation, we calculate the values of the previously stated BP regular designs, and it transpires that is one of the K-aberration optimal BP regular designs.
Here is an example of the application of Theorem 2. Consider finding BP regular designs that have desirable K-aberration characteristics. By checking the catalogs of nonisomorphic OP regular designs provided in [22], we only need to consider the OP regular design, denoted as , determined by the independent defining words , , and , since it has and the minimum among all the nonisomorphic regular designs, where is the th column of the matrix in (1) with , . There are BP regular designs associated with depending on whether the previously mentioned three defining words are equal to or . Among these regular BP designs, those with the minimum have the minimum according to Theorem 2 of [12]. For example, the regular BP design, denoted as , which is determined by the defining words , , and , has the minimum and then the minimum . At the same time, has the minimum , which indicates that has the minimum according to Theorem 2. As a confirmation, we calculate the values of all the BP regular designs and find that is one of the K-aberration optimal BP regular designs.
The two examples above show that Theorems 1 and 2 can help to filter or select designs. Take Theorem 1 as an example. When the experimenters need to compare designs with resolution 4, they can firstly calculate , , , , , , and according to the defining words of the designs and then according to Theorem 1. Clearly, this is time-saving compared with calculating among the designs.
The results of Theorems 1 and 2 establish a relation between the K-values and word length pattern of a design. To facilitate practitioners in other fields applying the methods, the following algorithm is provided based on Theorem 1. The algorithm (Algorithm 1) can be directly extended if Theorem 2 is required.
| Algorithm 1: For a given n and m, consider a design D with resolution 4. |
|
Here, we would like to point out that, to calculate the defining words of D, one should refer to [20].
5. Concluding Remarks
In experiments with each factor having a null state or baseline level, the BP model has quite a natural explanation. Then, finding the optimal fractional factorial designs under BP becomes important. However, the number of nonisomorphic designs under BP is much larger than that under OP, which makes it intricate for us in finding the optimal designs. Together with the results in [12], the present work helps narrow down the choice of finding the optimal regular designs through bridging the K-values and WLP. Nonregular designs are commonly used in various experiments due to their flexibility of the run size. Therefore, further study that focuses on finding the optimal nonregular designs under BP is deserved. However, just like the regular designs, it is also intricate in finding the optimal nonregular designs under BP. An algorithm reducing the candidates of the optimal designs would be very useful.
Author Contributions
Conceptualization, S.Z.; methodology, S.Z. and M.Q.; writing—original draft preparation, M.Q.; writing—review and editing, S.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation of China, grant number 12171277.
Institutional Review Board Statement
Not applicable.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Bose, R.C. On some connections between the design of experiments and information theory. Bull. Inst. Internat. Statist. 1961, 38, 257–271. [Google Scholar]
- Cheng, C.-S.; Tang, B. Theory of Nonregular Factorial Designs; CRC Press: Boca Raton, FL, USA, 2025. [Google Scholar]
- Dey, A.; Mukerjee, R. Fractional Factorial Plans; Wiley: New York, NY, USA, 2009. [Google Scholar]
- Kerr, K.F. Efficient 2k factorial designs for blocks of size 2 with microarray applications. J. Qual. Technol. 2006, 38, 309–318. [Google Scholar] [CrossRef]
- Glonek, G.F.V.; Solomon, P.J. Factorial and time course designs for cDNA microarray experiments. Biostatistics 2004, 5, 89–111. [Google Scholar] [CrossRef] [PubMed]
- Banerjee, T.; Mukerjee, R. Optimal factorial designs for cDNA microarray experiments. Ann. Appl. Statist. 2008, 2, 366–385. [Google Scholar] [CrossRef]
- Mukerjee, R.; Tang, B. Optimal fractions of two-level factorials under a baseline parameterization. Biometrika 2012, 99, 71–84. [Google Scholar] [CrossRef]
- Mukerjee, R.; Huda, S. Approximate theory-aided robust efficient factorial fractions under baseline parametrization. Ann. Inst. Statist. Math. 2016, 68, 787–803. [Google Scholar] [CrossRef]
- Liu, Y.; Ren, M.; Zhao, S.L. Robust and efficient factorial designs under baseline parametrization. Commun. Statist. Theory Methods 2025, 54, 1868–1879. [Google Scholar] [CrossRef]
- Karunanayaka, R.C.; Tang, B. Compromise designs under baseline parameterization. J. Statist. Plann. Inference 2017, 190, 32–38. [Google Scholar] [CrossRef]
- Li, P.; Miller, A.; Tang, B. Algorithmic search for baseline minimum aberration designs. J. Statist. Plann. Inference 2014, 149, 172–182. [Google Scholar] [CrossRef]
- Miller, A.; Tang, B. Using regular fractions of two-level designs to find baseline designs. Statist. Sinica 2016, 26, 745–759. [Google Scholar] [CrossRef]
- Mukerjee, R.; Tang, B. Optimal two-level regular designs under baseline parametrization via cosets and minimum moment aberration. Statist. Sinica 2016, 26, 1001–1019. [Google Scholar] [CrossRef]
- Lin, C.Y.; Yang, P. Robust multistratum baseline design. Comput. Statist. Data Anal. 2018, 118, 98–111. [Google Scholar] [CrossRef]
- Li, W.; Liu, M.Q.; Tang, B. A systematic construction of compromise designs under baseline parameterization. J. Statist. Plann. Inference 2022, 219, 33–42. [Google Scholar] [CrossRef]
- Chen, A.; Sun, C.Y.; Tang, B. Selecting baseline designs using a minimum aberration criterion when some two-factor interactions are important. Statist. Theory Relat. Fields 2021, 5, 95–101. [Google Scholar] [CrossRef]
- Sun, C.Y.; Tang, B. Relationship between orthogonal and baseline parameterizations and its application to design constructions. Statist. Sinica 2022, 32, 239–250. [Google Scholar] [CrossRef]
- Yan, Z.H.; Zhao, S.L. Optimal fractions of three-level factorials under a baseline parameterization. Statist. Probab. Lett. 2023, 202, 109902. [Google Scholar] [CrossRef]
- Yan, Z.H.; Zhao, S.L. Optimal s-level fractional factorial designs under baseline parameterization. J. Statist. Plann. Inference 2025, 236, 106242. [Google Scholar] [CrossRef]
- Fries, A.; Hunter, W.G. Minimum aberration 2k-p designs. Technometrics 1980, 22, 601–608. [Google Scholar]
- Deng, L.-Y.; Tang, B. Generalized resolution and minimum aberration criteria for Plackett-Burman and other nonregular factorial designs. Statist. Sin. 1999, 9, 1071–1082. [Google Scholar]
- Xu, H. Algorithmic Construction of Efficient Fractional Factorial Designs with Large Run Sizes. Available online: http://www.stat.ucla.edu/~hqxu/pub/ffd2r/ (accessed on 30 March 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).