Interep: An R Package for High-Dimensional Interaction Analysis of the Repeated Measurement Data

We introduce interep, an R package for interaction analysis of repeated measurement data with high-dimensional main and interaction effects. In G × E interaction studies, the forms of environmental factors play a critical role in determining how structured sparsity should be imposed in the high-dimensional scenario to identify important effects. Zhou et al. (2019) (PMID: 31816972) proposed a longitudinal penalization method to select main and interaction effects corresponding to the individual and group structure, respectively, which requires a mixture of individual and group level penalties. The R package interep implements generalized estimating equation (GEE)-based penalization methods with this sparsity assumption. Moreover, alternative methods have also been implemented in the package. These alternative methods merely select effects on an individual level and ignore the group-level interaction structure. In this software article, we first introduce the statistical methodology corresponding to the penalized GEE methods implemented in the package. Next, we present the usage of the core and supporting functions, which is followed by a simulation example with R codes and annotations. The R package interep is available at The Comprehensive R Archive Network (CRAN).


Introduction
In longitudinal studies, the same subjects are repeatedly measured over a set of units, such as a period of time. The correlated nature of the repeated measurements has motivated the development of a wide spectrum of approaches that account for intra-cluster interconnections [1,2]. One of the representative examples is the generalized estimating equation (GEE), originally proposed by Liang and Zeger [3]. It only depends on the first two marginal moments of repeated measurements and a working correlation matrix to fully incorporate the intra-cluster correlations of longitudinal data. Such a feature is particularly appealing when the full likelihood function is difficult to specify under non-Gaussian assumptions. In the past decade, variable selection has gradually become one of the central tasks in longitudinal studies due to the explosion of the use of high-dimensional data. In the literature, Wang et al. [4] developed the penalized generalized estimating equation (PGEE) with LASSO [5] to select important features. Although GEE and PGEE are robust when the working correlations are misspecified, the resulting estimators can be inefficient. To overcome this limitation, Cho and Qu [6] developed the penalized quadratic inference function (PQIF) method for variable selection with penalty functions such as SCAD [7], LASSO and adaptive LASSO [8], respectively.
For repeated-measurement data, regularized estimation with the above penalty functions continuously shrinks the regression coefficients towards zero. If the tuning parameters are properly chosen, regression coefficients corresponding to unimportant features will be estimated as zero, whereas those for important ones are still nonzero. Therefore, variable selection and parameter estimation can be achieved at the same time in the longitudinal setting. Despite the success, these penalization methods, including those of Wang et al. [4] and Cho and Qu [6], among many others, mainly focus on selecting the main effects and ignore the interactions. Variable selection methods that do not appropriately account for the interaction structure generally yield inaccurate estimation results and false findings [9]. Use the gene-environment (G × E) interaction as an example. The G × E interactions are typically modeled as the product terms between low-dimensional E factors and highdimensional G factors. The identification of important genetic main effects and G × E interactions are of primary interest since the E factors are usually determined based on existing studies and are not subject to selection. This essentially requires sparse group or bi-level selection [10][11][12]. With the repeated measure response, although PGEE and PQIF can still be adopted for G × E interactions by treating interactions as the main effects, they no longer lead to the optimal estimation and identification results [13,14].
The environmental factors take a very unique form in the longitudinal G × E study conducted by Zhou et al. [13]. The interaction effects were initially modeled as the product of lipidomics features and a treatment factor consisting of four different levels. Since the E factor is then represented as three dummy variables (or binary indicators), the identification of important lipids by environmental interactions requires the selection to be performed in a "group-in, group-out" manner, as in the group LASSO [15]. Such a formulation eventually leads to a mixture of individual-and group-level penalty functions used to identify genetic main effects and G × E interactions simultaneously under the longitudinal outcome which was the bodyweight of CD1 mice, repeatedly measured over 10 consecutive weeks.Please refer to Zhou et al. [13] and King et al. [16] for more detailed descriptions of the data. Note that although Zhou et al. [13] was partially driven by an animal model, the proposed methods are potentially applicable in G × E studies of human diseases as long as the selection of interactions is conducted in a group manner.
In this article, we demonstrate the R package interep [17] that implements the GEEbased penalization methods proposed in Zhou et al. [13] for the simultaneous selection of main and interaction effects on the individual and group level, correspondingly. These procedures assume linear G × E interactions [9], which have not been examined in highdimensional longitudinal studies to date. On the other hand, the nonlinear G × E interactions [18][19][20] are usually dissected through the varying coefficient model and its variants, which can be analyzed using the high-dimensional longitudinal varying coefficient models in studies such as [21,22].
The open-source R package interep is publicly available on CRAN at https://cran.rproject.org/package=interep (accessed on 11 March 2022) [17]. To facilitate rapid computation, we developed the core modules of the R package in C++. The rest of the paper is organized as follows. In Section 2, we provide an overview of penalized interaction analysis in longitudinal studies. Section 3 describes the main and supporting functions implemented in the interep package. Simulation examples designed to showcase the usage of the package are presented in Section 4. We conclude the paper with discussions in Section 5.

Data Structure and Model Setup for the Longitudinal Interaction Analysis
In the longitudinal G × E study with n subjects, we observe t i repeated measurements for the ith (1 i n) subject over time. Y ij represents the response (or phenotype) for the ith subject at time j (1 j t i ). G ij = (G ij1 , . . . , G ijp ) denotes the p-dimensional vector of genetic factors. E ij = (E ij1 , . . . , E ijq ) is the q-dimensional environmental factors coded for the treatment with (q + 1) levels. Henceforth, we assume t i = t < ∞ without the loss of generality. The G × E model consists of the genetic and environmental main effects, and the G × E interactions, which are associated with the repeatedly measured phenotypic trait, can be specified as follows: where β n = (β n0 , β n1 , β n2 , β n3 ) and Z ij = (1, E ij , G ij , (G ij ⊗ E ij ) ) are (pq + p + q + 1)dimensional vectors denoting all the main and interaction effects. β n0 is the intercept. β n1 , β n2 , and β n3 are unknown coefficient vectors of dimensions q, p, and pq, respectively. Furthermore, G ij ⊗ E ij is the pq-dimensional vector of the Kronecker product in the following form: which is adopted to model the G × E interactions. The observations measured from the same subjects are correlated, and are assumed to be independent if they are from different subjects. The random error ij (1 j t) has a mean zero and a finite variance. For convenience, we assume that i = ( i1 , . . . , it ) T follows a multivariate normal distribution N t (0, Σ i ), with Σ i as the covariance matrix for the repeated measure of the ith subject across the t time points.

An Overview of Generalized Estimating Equations in the Interaction Analysis
In the literature, G × E interactions with multiple phenotypes have been shown to be effective when the correlations among multivariate measurements can be efficiently accommodated [23]. The intra-correlative nature of longitudinal data makes it difficult to specify the joint likelihood function, especially when the responses are not normal. Liang and Zeger [3] developed the generalized estimating equation (GEE) framework to incorporate the correlations of repeated measurements from the same subject. GEE conducts marginal regression using a quasi-likelihood function. It relaxes the distributional assumption by only requiring the specification of the first two moments and a working correlation matrix for Y ij . We express the first two marginal moments of Y ij as E(Y ij ) = µ ij = Z T ij β n and Var(Y ij ) = υ(µ ij ), correspondingly. Here, υ denotes a known variance function. The estimating equation for β n can then be written as: where µ i (β n ) = (µ i1 (β n ), . . . , µ it (β n )) and Y i = (Y i1 , . . . , Y it ) . V i is the covariance matrix of Y i . In Equation (2), ∂β n is equivalent to Z i = (Z i1 , . . . , Z it ) for the t × (pq + p + q + 1) matrix of the genetic main effects and G × E interactions.
Liang and Zeger [3] proposed the estimation of V i , the t × t covariance matrix for the ith subject, through a working correlation matrix R(η) as and the t × t working correlation matrix R(η) is indexed by a finite-dimensional vector of nuisance parameters η. Widely chosen correlation structures for R(η) include first-order auto-regressive or AR(1), exchangeable and independent structures, among others. One of the defining features of GEE, as shown by Liang and Zeger [3], is that the GEE estimator of regression coefficients can be consistently estimated if η can be consistently estimated, and that this consistency holds even when the working correlation is misspecified.

Penalized Identification of G × E Interactions in the Longitudinal Study
In this study, the environmental factors of interest denote a group of binary indicators representing treatment levels. Although this group's dimensionality is limited, given the high-dimensional genetic factors and the G × E interactions, the identification of important effects requires variable selection. Representative longitudinal variable selection methods, including PGEE [4] and PQIF [6], aim at selecting the main effects and ignore the interactions. Furthermore, the G × E interaction modeled in Equation (1) is between the G factor and all the q factor levels. Its selection is in a "group in, group out" spirit, whereas the selection of main genetic factors is on the individual level. Thus, the tailored variable selection method can conduct shrinkage estimations on a mixture of group and individual levels, as proposed in Zhou et al. [13]. We first define U(β n ) as the score equation for β n : To perform regularized selection, we adopt the minimax concave penalty (MCP) as the baseline penalty function as follows, due to its computational convenience and rigorous theoretical properties [24]: where λ is the tuning parameter and γ is the regularization parameter, with the first derivative: Then, the penalized generalized estimating equation for the interaction analysis of repeated measure data is: where coefficient β n2g corresponds to the gth genetic main effect, and coefficient vector β n3h represents the effect of interaction between the hth G factor and E factors. The penalty term involving β n2g imposes individual-level shrinkage to select G factors. Meanwhile, since the E factors are in the form of a group which should be chosen or discarded in a group-wise manner, the penalty term with respect to ||β n3h || Σ h conducts group-level regularization to select G × E interactions through penalizing the empirical norm ||β n3h || Σ h .
where B h is the subset of the design matrix corresponding to the interaction between the E factor and the hth genetic factor. In addition, λ 1 and λ 2 in Equation (3) are tuning parameters controlling the amount of shrinkage in the selection of main and interaction effects.
Here, we adopt MCP as the baseline penalty due to its computational efficiency and accuracy, as observed in our past studies. Other popular choices include LASSO, adaptive LASSO and SCAD [25]. Although we conjecture that the alternative choices of baseline penalty function can be equally applicable, it is beyond the scope of our study to examine their pros and cons in the interaction studies. These baseline penalty functions and their extensions have been implemented in a number of R packages. For example, LASSO and relevant regularization methods (including ridge regression and elastic net) have been implemented in the glmnet package for generalized linear models [26]. In repeated-measurement studies, the R package PGEE has been developed for PGEE with the non-convex SCAD penalty [27]. Furthermore, please refer to R package https://CRAN. R-project.org/package=regnet (accessed on 11 March 2022) for estimation and variable selection in generalized linear models using MCP and its extensions [28,29]. In general, advanced regularization methods can be developed based on the baseline penalty functions to accommodate different patterns of sparsity [30][31][32]. In studies of complex diseases, bi-level structures are commonly found and thus motivate the development of statistical methods that efficiently incorporate such a hierarchy [33][34][35]. Regularization has been shown to be particularly efficient in accommodating bi-level or sparse group sparsity in G × E studies [9].
The dataset generated from a profiling study assessing lipid changes in weightcontrolled mice [16] motivated the development of penalized longitudinal interaction methods in [13]. This dataset has the following characteristics. First, the genetic factor is defined as lipidomics features which have been rarely examined in a high-dimensional scenario using variable selection. Second, the lipid-environment interactions have a group structure, so the selection of main and interactive effects is supposed to be accommodated simultaneously on both the group and individual levels.
Nevertheless, the uniqueness of the dataset does not limit the scope of the proposed methods. The G factors can be any multi-omics measurements, such as mRNA expressions, single-nucleotide polymorphisms (SNPs) and copy number variations (CNVs), as long as their dimensionality is large. In terms of the special form of the environmental factors, although it was constructed based on a treatment with exercise and/or dietary restrictions with multiple factor levels, the proposed methods are still applicable if the structure of E factors results in group-level selection. For instance, in cancer genomics studies, the stage of cancer is a categorical variable usually with more than two levels, which can lead to the group-level interaction structure.

Computational Algorithms
We implemented the Newton-Raphson algorithm in the interep [17] package for the efficient estimation ofβ under the objective function (3). The algorithm proceeds iteratively.
where U(β d n ) is the score function inβ d n and T(β d n ) is defined as the first-order derivative function of U(β d ): In the objective function (3), the MCP penalty is utilized on both the individual and group level to detect main and interactive effects. Accordingly, W(β d n ) is a diagonal matrix with the diagonal entries corresponding to the first-order derivatives of MCP and group MCP as follows: where is a small positive fraction (e.g., 10 −6 ) to ensure numerical stability when the denominator is close to zero. The first (q + 1) elements on the diagonal of W(β d ) are zero, indicating that the intercept and environmental factors are not under regularized selection. The terms nWβ n and nW can be adopted to approximate the first-order derivative function of MCP in the penalized score equation and the second derivative function of the MCP penalty, respectively. With fixed tunings, the vector of regression parametersβ d+1 n is estimated iteratively until convergence is achieved. The update ofβ d+1 n is terminated if the magnitude of the L1 difference betweenβ d n andβ d+1 n is below a cutoff (e.g., 10 −3 ). We observe that the convergence is usually achieved within 10 iterations.
The penalized GEE (3) involves two tuning parameters λ 1 and λ 2 , and a regularization parameter γ. λ 1 and λ 2 determine the level of sparsity in MCP and group MCP, separately. The regularization parameter γ balances the smoothness of the resulting estimator and its unbiasedness. The insensitivity of specifying γ has been observed by Zhou et al. [13] and other studies [28,29]. We adopt K-fold cross-validation to determine the optimal pair of (λ 1 , λ 2 ). The dataset is split into K non-overlapping, roughly equal-sized parts. The kth (k = 1, . . . , K) fold is held out as the testing data, and the rest are used as training data. We use n −k and n k as the index sets to denote subjects in the testing and training set, respectively. Then the prediction error can be computed as whereβ n k is the penalized estimate obtained based on the training data, and |n −k | is the cardinality of n −k . The computation will be conducted for each one of the K folds, leading to a cross-validation error expressed as Over a two-dimensional grid of (λ 1 , λ 2 ), the best pair of tunings is chosen as the one yielding the smallest cross-validation error. The algorithm proceeds as follows: 1 Specify an appropriate range for the two-dimensional grid of (λ 1 , λ 2 ); 2 For a fixed (λ 1 , λ 2 ), (a) set an initial value toβ 0 n ; compute the cross-validation error based on Equation (5). 3 For each (λ 1 , λ 2 ) over the grid, repeat Step 2 until convergence and locate the optimal pair corresponding to the smallest cross-validation error. 4 Reportβ n with respect to the optimal (λ 1 , λ 2 ).
Another popular strategy to pinpoint optimal tunings is based on independentlygenerated testing datasets. In simulation studies, as the data generating model is available, one can readily generate an independent testing dataset with a sample size much larger than the training data. In this way, the "cross" in CV is no longer necessary. The computation is thus less intensive and more efficient than cross-validation.

The R Package Interep
The interep package includes two main functions, interep and cv.interep. The function interep fits an GEE-based penalized interaction model to repeated-measure outcomes with high-dimensional main and interaction effects. The function cv.interep computes the cross-validation error over the two-dimensional grid of tuning parameters. In addition, there are supporting functions: dmcp, penalty and reformat. All these functions were developed by the authors. The core modules of the Newton-Raphson algorithm were written in C++. Therefore, the package is linked to Rcpp and RcppArmadillo [36][37][38].

The Main Functions
The standard R code for computing the regularized estimates with fixed tuning parameters provided by the interep package is i n t e r e p ( e , g , y , beta0 , c o r r e , pmethod , lam1 , lam2 , maxits ) The function interep implements step 2 of the computational algorithms described in Section 2.4. The input argument e is a matrix of environment factors. In Zhou et al. [13], the environment factors form a group of dummy variables corresponding to the treatment. Such a form is not required to specify the input argument e here, which enables a wide usage of the package as long as selection on a group level is of interest. The input argument g is a matrix of G factors which can be lipidomics features in addition to popularly examined multi-omics measurements such as gene expressions, single-nucleotide polymorphisms (SNPs), DNA methylation and copy number variations (CNVs). Instead of directly spec-ifying the entire design matrix as an input in R packages for regularized selection of the main effects, such as glmnet and PGEE [26,27], users of our package only need to provide the G and E factors, and the function interep will formulate the corresponding G × E design matrix accordingly. To offer users of the package more flexibility, we include beta0 as the argument for initial values. There are some typical choices for beta0 such as zeros or regularized estimates, using LASSO or ridge regression.
The character string argument corre can be used to specify the three working correlation structures implemented in Zhou et al. [13]. Specifically, corre = "i" denotes the independent correlation, whereas corre = "a" and corre = "e" denote AR-1 and exchangeable correlations, respectively. In addition to the proposed interaction analysis consisting of the selection of main effects on the individual level and interactions on the group level, Zhou et al. [13] includes an alternative method for comparison, which merely conducts selection on the individual level. The argument pmethod leads to the two major categories of methods under comparison. The proposed interaction method is called if corre = "mixed" is indicated, which demands an "MCP+ group MCP" type of penalty. On the other hand, corre = "individual" leads to the usage of the alternative methods, depending on MCP, to identify the effects.
The function interep computes regularized estimates for main effects and interactions under fixed tuning parameters which are provided by users as scalars for arguments lam1 and lam2, correspondingly. Note that when corre= "individual" is suggested, the second tuning parameter lam2 is not needed. The regularized estimation will only be conducted based on lam1. The input argument maxits is the maximum number of iterations allowed in the iterative algorithms.
In the interep package, the function cv.interep is closely related to the interep function described above. The R code is: cv . i n t e r e p ( e , g , y , beta0 , lambda1 , lambda2 , n f o l d s , c o r r e , pmethod , maxits ) The function cv.interep adopts the function interep to perform cross-validation over sequences of tuning parameters. Therefore, the common group of input arguments are shared between the two functions. One major difference lies in the arguments lambda1 and lambda2. These two arguments are no longer scalars. Instead, they are two user-supplied sequences for selecting individual-level main effects and group-level interaction effects, respectively. In the function cv.interep, we adopt the notations lambda1 and lambda2 to distinguish the sequences of tunings from the fixed ones (lam2 and lam2) used in the function interep. The cross-validation error will be computed over the two-dimensional grid of lambda1 and lambda2. The argument nfolds is the number of folds employed in cross validation.

Other Supporting Functions
In addition to the aforementioned core modules, the interep package also includes multiple supporting functions. MCP is the baseline penalty function adopted by Zhou et al. [13]; therefore, dmcp, the first-order derivative function of the MCP, is provided in the package. Selection can be conducted on the individual and group levels at the same time, or only on the individual level, which have been implemented through the function penalty. Argument pmethod = "mixed" refers to the bi-level selection. Similarly, pmethod = "individual" denotes the individual level selection, where the input for argument lam2 is no longer required. Furthermore, the function reformat changes the wide format of the repeated measurement to the long format. For example, with the phenotype repeatedly measured over five time points and a sample size of 250, the dimension of phenotype (or response) in wide format is 250 by 5. It changes to 1250 by 1 after applying reformat. Note that the matrix of main and interaction effects does not vary over time in Zhou et al. [13]. If the dimension is 250 by 75 for the argument x, then after reformatting, the dimension changes to 1250 by 76, including the intercept. Moreover, a simulated dataset, dat, is provided to demonstrate the penalized selection in the proposed longitudinal study. We provide more details in the next section.

Simulation
In this section, we will use a simulation example to demonstrate how to use the interep package to obtain regularized regression coefficients. The data-generating function is provided below.  ( 1 , 2 , 3 , 5 , 7 , 1 0 , 1 5 , ( p+q + 1 ) : ( p+q + 3 ) , ( p+5 * q + 1 ) : ( p+5 * q + 3 ) , ( p+10 * q + 1 ) : ( p+10 * q + 3 ) ) dat = l i s t ( y=y , e=e , g=g , index=index , c o e f=c o e f ) r e t u r n ( dat ) } Next, we used the following sample R codes to generate a dataset with 250 subjects (n = 250), 75 genetic factors (p = 75) and 3 environment factors (q = 3). Moreover, there were five time points (k = 5). Repeated measurements were assumed to be correlated with the AR(1) structure, with a correlation coefficient ρ of 0.5. The output from the R console is given below: > l i b r a r y ( i n t e r e p ) > l i b r a r y (MASS) > s e t . seed ( 1 0 0 0 ) > n = 2 5 0 ; p = 7 5 ; k = 5 ; q = 3 ; rho = 0 . 5 > dat=Data1 ( n , p , k , q , rho ) > e=dat $ e > g=dat $g > y=dat $y > dim ( e ) [ 1 ]  The R codes dat$coef provided the exact coefficients used to simulate repeated measurements in the data-generating model. These coefficients are reproducible since we set the seed before calling the function Data1. With 75 genetic factors and 3 environmental factors, the total number of main and interaction effects was 304, including the intercept. The last row of the R output shows the locations of these true effects in the vector of regression coefficients of length 304. Then, with fixed tuning parameters, we adopted the function interep to estimate the regression coefficients using the sparse interaction method with exchangeable working correlation. As the positions of true effects on the vector of regression coefficients were known and saved in the variable index.true, we were able to obtain the number of true positives (TP) and false positives (FP). The TPs and FPs were printed in the R console. For the convenience of comparison, we show the locations of true and estimated non-zero regression coefficients, which are saved in variables index.true and index.est, respectively. The intercept, which corresponds to the 1st position, is not included in the two index vectors. The first 20 entries of beta.est from the R console are also provided below.  The interep package has thus implemented six approaches-the simultaneous selection on both the individual and group levels under the exchangeable correlation (A1); the AR(1) correlation (A2) and independent correlation (A3); as well as A4-A6, denoting the incorporation of the exchangeable, AR(1), and independent workingcorrelation to conduct individual-level selection only. The computational time for all the methods is provided in Table 1 below. In Zhou et al. [13], we analyzed the longitudinal lipidomics data generated by King et al. [16] using methods implemented in the interep package. To avoid a re-analysis of the same data leading to identical findings reported in Zhou et al. [13], we have not pursued real data analysis in this study.

Discussion
In this article, we present the R package interep that implements the proposed and alternative methods from Zhou et al. [13] for high-dimensional interaction analysis with repeated-measurement data. Although the regularization methods were initially motivated by a lipidomics study [16,39,40], the R package is broadly applicable when interaction effects are in the form of groups and genetic factors are of high dimensionality. We adopt MCP as the baseline penalty function for all methods implemented in package interep.
Other baseline penalties such as LASSO, adaptive LASSO and SCAD will be considered in the future updates of the package.
We acknowledge that there exists a wide diversity of statistical methods, not necessarily within the regularization framework, that can be applied for gene-environment interaction studies. For example, recently, Liu et al. [41] developed a tree-based method to detect important gene-environment interactions when genetic factors are rare variants. Furthermore, the additive main-effect and multiplicative interaction (AMMI) model has been widely adopted in agricultural studies with multi-location variety trials [42,43]. However, Liu et al. [41] cannot accommodate longitudinal traits, and the AMMI model has rooted in ANOVA, which fails when the sample size is smaller than the number of features (i.e., the total number of main and interaction effects). In addition, statistical software such as GenStat [44] and STATISTICA [45] analyze repeated-measurement data using one-way or two-way ANOVA models which, again, require a larger sample size than the number of features. Therefore, they do not work for high-dimensional repeated-measurement data. Comparisons with methods or software that fail under high-dimensional settings are not possible. To the best of our knowledge, the most relevant alternatives have been implemented in the package interep.
Our biased search shows that only a very limited number of R packages are available for variable selection in high-dimensional longitudinal studies. The advantage and power of the interep package lies in its incorporation of group-wise sparsity in interaction studies with repeated measurements. It is also user-friendly in that the G and E factors can be specified separately as the input arguments. Other relevant packages, such as the PGEE package [27], focus on main effects and cannot be directly adopted for longitudinal interaction studies. Currently, all approaches implemented in the interep package have been developed within the generalized estimating equation (GEE) framework. The regularized GEE under interaction models can be efficiently solved using the Newton-Raphson algorithm, as demonstrated by the numerical experiments conducted in this study and in Zhou et al. [13].
The scale of longitudinal data is determined by the dimensionality of G and E factors and the sample size, as well as number of repeated measurements, which jointly affect the computational speed of package interep. In general, regularized variable selection works well when the number of features is reasonably larger than the sample size. Thus, a pre-screening step is necessary if the G factor is ultra-high-dimesional [46]. Environmental main effects are not subject to selection as E factors are usually determined in advance based on existing studies. As the selection is performed in the group manner, the number of E factors considered should not be excessively large or the stability of group-wise selection is sacrificed. In addition, although we set the time points (i.e., the number of repeated measurements) to five in the simulation study, the case study presented in Zhou et al. [13] shows the superior performance of the package when a phenotypic trait has been measured over 10 consecutive weeks. The trait is densely measured as the number of time points further increases. Therefore, an alternative is to consider functional data analysis [47].
The problem of missing data often occurs in longitudinal studies. There are three well-acknowledged missing mechanisms: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) [48,49]. In practice, it is critical to evaluate the missing mechanisms through the sensitivity analysis before model training. Currently, the interep package cannot directly process repeated measurements with missing data. Users of the package have to manually remove missing observations before further analysis. We will explore incorporating missingness, especially MCAR, into the current interaction analysis in the near future.
Robustness to outliers and model misspecification is critical for the success of G × E studies. Although adopting robust loss/likelihood functions can safeguard long-tailed distributions in phenotypes [10,12], the varying coefficient models and their extended family have been systematically examined to capture G × E interactions beyond the linear assumption [50][51][52]. In repeated-measurement studies, the identification of nonlinear interactions within the GEE and QIF frameworks is also robust to misspecifications of working correlations. QIF has improved efficiency over GEE when the working correlation is misspecified. Although relevant models have also been proposed in the literature, our limited search of R packages (through CRAN) indicates that the corresponding codes are not available in the form of R packages. The release of the R package interep will benefit practitioners from a diversity of fields in which interaction analysis of longitudinal data is of interest.
Funding: This study was supported by an Innovative Research Award from the Johnson Cancer Research Center at Kansas State University.

Institutional Review Board Statement:
This study demonstrates the statistical methods implemented in R package interep, as well as the usage of the package on simulated data. No human or animal datasets have been analyzed. The IRB is not applicable.

Data Availability Statement:
The simulated data can be reproduced by rerunning the R codes presented in the article.