QSAR Modelling of Peptidomimetic Derivatives towards HKU4-CoV 3CL pro Inhibitors against MERS-CoV

: In this paper, we report the relationship between the anti-MERS-CoV activities of the HKU4 derived peptides for some peptidomimetic compounds and various descriptors using the quantitative structure activity relationships (QSAR) methods. The used descriptors were computed using ChemSketch, Marvin Sketch and ChemOfﬁce software. The principal components analysis (PCA) and the multiple linear regression (MLR) methods were used to propose a model with reliable predictive capacity. The original data set of 41 peptidomimetic derivatives was randomly divided into training and test sets of 34 and 7 compounds, respectively. The predictive ability of the best MLR model was assessed by determination coefﬁcient R 2 = 0.691, cross-validation parameter Q 2cv = 0.528 and the external validation parameter R 2test = 0.794.


Introduction
Middle East Respiratory Syndrome (MERS) is a respiratory infection disease that emerged in Saudi Arabia in 2012 [1,2]. In addition to Saudi Arabia, Egypt, Oman and Qatar were affected by this outbreak, with a high percentage of cases (>85%) [3][4][5]. The outbreak continued its spread until 2015 to affect 27 countries in Asia. Among these countries, South Korea was the most affected with 186 confirmed cases including 38 deaths. Approximately 35% of patients with MERS have died, but this may be an overestimate of the true mortality rate [6]. MERS-CoV is a zoonotic virus, which was transmitted from animals to human reservoirs [7,8]. The virus appears to cause more severe disease in older people, people with weakened immune systems, and those with chronic diseases such as renal disease, cancer, chronic lung disease and diabetes. In 2019, 203 new cases of MERS-CoV were reported. So far, neither vaccine nor effective treatment is available for this disease. Several efforts have been made by researchers throughout the world to develop an effective therapy against MERS-CoV infection. Many previous studies have shown that the MERS-CoV possesses a single-stranded positive-sense RNA genome with 2 open reading frames (ORFs) and encodes two polyprotein precursors [9][10][11][12] which are cleaved by 3CL Pro and a papain-like cysteine protease (PL Pro ) to generate 16 nonstructural proteins (NSP1−16) [13][14][15][16]. Thus, it represents a potential target for antiviral drug development. Nowadays, very few data are available on MERS-CoV 3CL pro inhibition by active molecules. Furthermore, HKU4-A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC 50 = −log (IC 50 )). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively. discovery [17].
A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. discovery [17]. A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22].  [17]. A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22].  [17]. A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22].

N°
Nowadays, very few data are available on MERS-CoV 3CL pro inhibition by active molecules. Furthermore, HKU4-CoV 3CL pro shares a high sequence identity (81%) with the MERS-CoV enzyme and thus represents a potential surrogate model for anti-MERS drug discovery [17].
A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. matics software allows the calculation of a thousand molecular descriptors [19]. This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. Nowadays, very few data are available on MERS-CoV 3CL pro inhibition by active molecules. Furthermore, HKU4-CoV 3CL pro shares a high sequence identity (81%) with the MERS-CoV enzyme and thus represents a potential surrogate model for anti-MERS drug discovery [17]. A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. Table 1. Chemical structures and activity experiment of 41 peptidomimetic compounds.

N°
Nowadays, very few data are available on MERS-CoV 3CL pro inhibition by active molecules. Furthermore, HKU4-CoV 3CL pro shares a high sequence identity (81%) with the MERS-CoV enzyme and thus represents a potential surrogate model for anti-MERS drug discovery [17].
A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. Nowadays, very few data are available on MERS-CoV 3CL pro inhibition by active molecules. Furthermore, HKU4-CoV 3CL pro shares a high sequence identity (81%) with the MERS-CoV enzyme and thus represents a potential surrogate model for anti-MERS drug discovery [17]. A quantitative structure−activity relationship approach attempts to explore the relationship between molecular descriptors that describe the unique physicochemical properties of the studied compounds and their respective biological activity [18]. It encodes the chemical structure through a variety of molecular descriptors, such as constitutional, topological, thermodynamic, electronic, geometrical. The development of new cheminformatics software allows the calculation of a thousand molecular descriptors [19].
This study aims to build QSAR models, which explain the relationship between anti-MERS-CoV activity and the structure of 41 peptidomimetic based on physicochemical descriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively. In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22].

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. scriptors using statistical methods. Multiple linear regression (MLR) was used for numerical characterization of the compounds based on the selected descriptors by PCA. The quality of the developed QSAR model was checked using statistical parameters and several validation methods.

Data Set
A series of 41 peptidomimetic derivatives was studied for their anti-HKU4 activity [13]. Table 1 presents the structure and the activity values for these compounds (pIC50 = −log (IC50)). The compounds of this series were drawn using the ChemDraw, available in ChemOffice software, as shown in Table 1, and the descriptors were calculated using ChemSketch, ChemOffice and Marvin Sketch software. The studied compounds were randomly divided into a training set used to build QSAR models and a test set used to evaluate the predictive power of models, consisting of 34 and 7 compounds, respectively.
In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22]. ChemSketch software was used to calculate formula weight (FW), percentage of carbon, hydrogen, nitrogen, oxygen and sulfur atoms (% C, % H, % N, % O and % S), molar volume (MV (cm 3 )), parachor (Pa (cm 3 )), refractive index (RI), surface tension (ST (dyne/cm)), density (D (g/cm 3 )), polarizability (Po (cm 3 )), ring double bond equivalents (RDBE), and nominal mass (NM (Da)) ( Table S1).
MarvinSketch and ChemOffice have been used to build-in structure to calculate the following descriptors: partition coefficient octanol-water (Log P), hydrophilic-lipophilic balance (HLB kcal/mol)), MMFF94 energy (ME (kcal/mol)), polar surface area (PSA), Van Der Waals surface area (VDWSA), Van Der Waals volume (VDWV), refractivity (R), number of H-bond acceptors (NHA), number of H-bond donors (NHD), molar refractivity In QSAR studies, it is recommended that the dataset is divided into several training and test sets (5:1 ratio) [20]. In the present study, QSAR models have been built following the OECD principles for acceptable QSAR models. This approach led to the generation of QSAR models possessing excellent statistical performance. Therefore, the whole dataset was randomly split into training and test sets by a good number of MLR models with the same size of training and test sets. Of the chemicals in the dataset, 35 compounds were selected for the training set used to build QSAR models and the remaining (7 compounds) were considered as the test set used to evaluate the predictive power of the models [21,22].

Statistical Analysis
In this study, XLSTAT [23] was used to accomplish both principal component analysis (PCA) and multiple linear regression (MLR). The method allows us to reduce the number of descriptors and keeps only those that are closely related to the activity. It also relies on studying the correlation matrix by removing those involving a large correlation. The MLR was initiated, with the aim to establish a mathematical relationship between inhibitory activity and a set of molecular descriptors. In other words, these two statistical methods depend on the assumption that there is a relationship that combines both the dependent variable (activity) and a series of independent variables (descriptors).

Validation of the QSAR Model
The predictive power of the built QSAR models was checked using internal and external validations.
We have used the leave-one-out (LOO) cross-validation for the internal validation. The cross validation parameter Q 2 cv was calculated. However, several previous studies have suggested that the only way to estimate the true predictive power of a QSAR model is to compare the predicted and observed activities for an external test set of compounds that were not used in the model's development [24][25][26][27][28][29]. The quality of the QSAR model is mostly determined by its ability to make predictions for things not included in the training set. The external validation parameter R 2 test was calculated.
The y-randomization test was used to validate the developed QSAR models, whereby the performance of the original model in data description (R 2 ) was compared to that of the built models. In other words, in this test, the random MLR models were generated by randomly shuffling the dependent variable while keeping the independent variables as they were. The newly established QSAR models were expected to have significantly low R 2 and Q 2 values for several trials, which confirmed that the developed QSAR models were robust. Another parameter, CRp 2 was also calculated which should be more than 0.5 [24].

Principal Components Analysis (PCA)
Thirty descriptors were calculated using ChemSketch, MarvinSketch and ChemOffice software (Tables S1-S3). The correlation matrix obtained by the ACP was analyzed to extract important information from a multivariate spreadsheet and to express this information as a set of a few new variables called the main components. Therefore, PCA was a very important stage for reducing descriptors while ensuring a minimum level of information loss.

Multiple Linear Regression (MLR)
Those descriptors remaining after PCA were used as an input for establishing MLR models. The best model obtained using MLR with the best statistical keys is represented by the following equation: where R 2 is the coefficient of determination; R 2 test is the coefficient of determination of the external test; R 2 adj is the adjusted coefficient of determination; MSE is the means of the square errors of the model; RMSE is root mean square error, F the coefficient of Fischer (Fisher statistics F) and P-value is the significance level.
From the model found we deduce that the activity depends on the following descriptors: PC, VDWV, VDWSA, NO and O%.
The high values obtained for the coefficient of determination, the coefficient of determination of the external test and the adjusted coefficient of determination, which exceeded 0.6, as well as the low value of mean squared errors and root mean square error, confirmed that the established model had reliable predictive power.
On the other hand, the Fisher test associated with the p-value indicates that we would take less than 0.01% of the risk assuming the null hypothesis was false and the regression equation was statistically significant.
The correlations between the predicted and observed activities are represented in Table 2 and illustrated in Figure 1. The high values obtained for the coefficient of determination, the coefficient of determination of the external test and the adjusted coefficient of determination, which exceeded 0.6, as well as the low value of mean squared errors and root mean square error, confirmed that the established model had reliable predictive power.
On the other hand, the Fisher test associated with the p-value indicates that we would take less than 0.01% of the risk assuming the null hypothesis was false and the regression equation was statistically significant.
The correlations between the predicted and observed activities are represented in Table 2 and illustrated in Figure 1.

Y-Randomization
The y-randomization test was applied to verify the validity and robustness of the built model. The obtained outcomes (Table 3) confirmed that the model was not obtained by chance. Based on all these results obtained by MLR, we can conclude that the built model has a good predictive power.

Conclusions
In this study, we have used thirty predefined descriptors for 41 peptidomimetic derivatives using ChemSketch, MarvinSketch and ChemOffice software. These descriptors are subjected to a statistical study using PCA analysis. In fact, the PCA was used to analyze and visualize the dataset, as well as to group the data into principal components. A linear model that combined five descriptors was found using the MLR method to predict the pIC 50 activity. The proposed QSAR model by the MLR in this study was statistically significant and has sufficient capacity to predict the anti-MERS-CoV activity.  Funding: The authors are thankful to the "Agence Universitaire de la Francophone (AUF)" for financial support under the project AUF-463/2020.