An Intelligent Hybrid Sentiment Analyzer for Personal Protective Medical Equipments Based on Word Embedding Technique: The COVID-19 Era

: Due to the accelerated growth of symmetrical sentiment data across different platforms, experimenting with different sentiment analysis (SA) techniques allows for better decision-making and strategic planning for different sectors. Speciﬁcally, the emergence of COVID-19 has enriched the data of people’s opinions and feelings about medical products. In this paper, we analyze people’s sentiments about the products of a well-known e-commerce website named Alibaba.com. People’s sentiments are experimented with using a novel evolutionary approach by applying advanced pre-trained word embedding for word presentations and combining them with an evolutionary feature selection mechanism to classify these opinions into different levels of ratings. The proposed approach is based on harmony search algorithm and different classiﬁcation techniques including random forest, k-nearest neighbor, AdaBoost, bagging, SVM, and REPtree to achieve competitive results with the least possible features. The experiments are conducted on ﬁve different datasets including medical gloves, hand sanitizer, medical oxygen, face masks, and a combination of all these datasets. The results show that the harmony search algorithm successfully reduced the number of features by 94.25%, 89.5%, 89.25%, 92.5%, and 84.25% for the medical glove, hand sanitizer, medical oxygen, face masks, and whole datasets, respectively, while keeping a competitive performance in terms of accuracy and root mean square error (RMSE) for the classiﬁcation techniques and decreasing the computational time required for classiﬁcation. process to select the best subset of features using different classiﬁers for evaluation. Six different classiﬁcation techniques including RF, k-NN, AdaBoost, bagging, SVM, and REPtree are examined for classifying the sentiment dataset. The results include the accuracy and RMSE values of the classiﬁcation process with the feature selection process and without it. The results show that applying the feature


Introduction
The use of social media is growing rapidly; it represents now, more than ever, a big part of our lives [1,2]. People from different countries are sharing their feelings, thoughts, attitudes, opinions, and concerns about different life aspects on these platforms on daily basis. The massive amount of data gained by different social channels has become the main source of information for different domains such as business, governments, and health [3]. Thus, the fast growth of information combined with the existence of advanced data mining and sentiment analysis (SA) techniques present an opportunity to mine these information in different sectors [4]. Analyzing these data is essential for decision-making and strategic planning in these fields [3,5].
Medhat et al. [6] has defined sentiment analysis (SA) or opinion mining (OM) as "the computational study of people's opinions, attitudes and emotions toward an entity". Sentiment analysis is a multidisciplinary field that focuses on analyzing people attitudes, reviews, feedback, and concerns toward different aspects of life including products, services, this huge amount of data, it is essential to collect and analyze this information by capturing the medical sentiment, which will be helpful for patients, decision-makers, and the whole health sector [19].
In this context, Jiménez-Zafra et al. [4] analyzed people's opinions posted in medical forums regarding doctors and drugs. It was found that drug reviews were more difficult to analyze than doctors' reviews. Although both reviews were written in informal language by non-professional users, drug reviews have greater lexical diversity. Abualigah et al. [21] had presented a brief review about using sentiment analysis in analyzing data about people's experience in healthcare medication, treatments, or diagnosis posted on their personal blogs, online forums, or medical websites. Patients visited different healthcare centers and shared their experiences concerning services, pleasure, and availability. Using sentiment analysis can help patients learn from others' experiences, report medical problems and resolve them, as well as improve medical decisions and increase healthcare quality. Another recent study by Polisena et al. [22] performed a scoping review about using sentiment analysis in health technology assessment (HTA). The study used the patients' posts on different social medial platforms about the effectiveness and safety of these health technologies such as medical devices, HPV vaccination, and drug therapies.
The COVID-19 outbreak began in late December 2019 and spread rapidly worldwide. The World Health Organization (WHO) announced it as a global pandemic on 11 March 2020 [23]. The pandemic had a negative effect on the biggest companies in different sectors and the whole productive system around the globe [12]. On the other hand, some companies in the medical product sector producing personal protective equipment (PPE) witnessed an increased demand for their products during the pandemic. The World Health Organization (WHO) named 17 products as the key products needed to deal with the pandemic [24], including personal protection equipment such as gloves and face masks or some medical devices for case management, such as oxygen sensors, oxygen concentrators, and respirators, in addition to sterilizers and pharmaceutical companies [25]. These medical supplies have suffered from a dramatic shortage during the pandemic due to their huge demand around the globe [24,25].
Due to the extreme importance of PPE products during the pandemic and the availability of different brands with different qualities on social commerce websites, people are seeking the best product and searching for customers' reviews about these medical products online. Even before the pandemic, people were obtaining information about different products from various social media platforms. Nowadays, when any person wants to buy a new product online, they will search for people's reviews and comments about that product on different social channels [14]. A study conducted by Deloitte has a similar opinion; it stated that "82% of purchase decisions have been directly influenced by reviews" [26].
This research studies people's sentiment regarding the main personal protective equipment (PPE) used during the pandemic including; face masks, medical gloves, hand sanitizer, and medical oxygen. Face masks, such as surgical masks and N95 respirators, represent the most essential product in fighting COVID-19. These masks are used by patients, healthcare workers, and all people to protect them from being infected. Certain types of masks can be reused or worn for a longer duration under certain circumstances. It was found that the demand for facial masks had increased ten times during the pandemic compared to world production prior to the crisis, and the price has increased as well [27]. In addition, medical gloves have a similar usage during the COVID-19 outbreak, especially when contacting any potentially infectious patients or materials. People become more aware of the importance of using gloves, and it is expected that they will continue using them even after the epidemic [28]. Hand sanitizer is another important product spread widely during the pandemic; it can be found in the entrances of hospitals, companies, malls, shops, schools, and all different buildings. People are buying and using it extensively and regularly [25]. Furthermore, in the context of the current pandemic, medical oxygen presents an essential step for the recovery of patients with severe COVID-19 [29]. According to Stein et al. [29], 41.3% of COVID-19 hospitalised patients in China need supplemental oxygen. Thus, huge supplies of medical oxygen are needed during the pandemic, hospitals and ICUs are at full capacity, and the whole health system is overwhelmed [30]. Therefore, home oxygen concentrators have become one of the primary sources of oxygen during the pandemic, since patients with middle-to-severe symptoms need home monitoring and treatment [31].
Motivated by the importance of the aforementioned medical products in saving lives during the pandemic and the significance of analyzing people sentiments regarding these products, this study takes the initiative to collect people's comments and reviews about COVID-19 medical products from Alibaba.com, one of the biggest e-commerce websites in the world to find the best PPE with the highest quality.
The study proposes a novel evolutionary approach aiming to classify people's sentiment towards these products, as online reviews are the most influential factor in the decision-making process. Moreover, the following approach applies an evolutionary feature selection method to select the best subset of features and enhance the classification performance in terms of accuracy and computational time. On the other hand, the proposed approach can be used in handling such crisis and medical situations quickly and efficiently in the future.
Furthermore, the contribution of this study can be summarized by the following points: • A new sentiment analysis study for classifying people's opinions towards medical products that are related to the COVID-19 crisis, including gloves, hand sanitizers, face masks, and home oxygen concentrators, to provide decision-makers with analyzed observations of customers' feedback to help them take prompt actions of the effectiveness of the products. • Applying advanced pre-trained word embedding learning techniques for feature extraction and word presentation to overcome the challenges of the data. • Conduct an evolutionary feature selection method to select the best subset of features, which are extracted by the word embedding technique, using different classifiers for evaluation.
The remainder of the paper is divided as follows: In Section 2, a literature review on sentiment analyses during COVID-19 is performed. A brief overview of the methods and concepts is described in Section 3. Section 4 describes the proposed approach. Section 5 explains the experiments and results of this study. Finally, Section 6 summarizes the conclusions future directions.

Related Work
Sentiment analysis (SA) can be briefly defined as detecting feelings expressed implicitly within the text. This text may be describing opinion, review, or behaviour towards an event, product, or organization [32]. SA has been a trend during the last decade to facilitate decision-making and support domain analysts' missions in a wide range of practical applications such as healthcare, finance [33], media, consumer markets [34], and government [35,36].
Due to the widespread use of online platforms such as Twitter, Facebook, Instagram, and WhatsApp, especially during the COVID-19 lockdown internationally, people are posting online reviews regarding used products. This resulted in extensive usage of consumer sentiment analysis (CSA) using online reviews [37]. These online reviews in which people indicate their opinions or attitude toward a service or product are found to be beneficial to organizations in analyzing customers' experience for improvement purposes [38].
In the medical domain, many research papers study the reviews of customers in various aspects. Gohil et al. [39] proposed a systematic literature review (SLR) on healthcare sentiment analysis by reviewing research studies targeting hospitals and healthcare reviews published on Twitter, while Lagu et al. [40] studied patient online reviews and their experience expressed using physician-rating websites in the United States. There were 66 potential websites, 28 of which met the inclusion criteria set of the study, with 8133 reviews found for 600 physicians. Additionally, Liu et al. [41] examined patients' satisfaction towards online pharmaceutical websites. The study involved several aspects of overall online B2C aspects, namely product, logistics, factors, price, information, and system used. Moreover, the Na and Kyaing [42] proposed a SA method based on the lexicon, grammatical relations, and semantic annotation on drug review websites. The approach was examined on 2700 collected reviews, and results found that applying it would be useful to drug makers and clinicians as well as patients.
SA is conducted based on the lexicon, machine learning (ML), or graph [35]. Lexiconbased works can be found in some studies. Jiménez-Zafra et al. [4] examined combining supervised learning and lexicon-based SA for reviews about both drugs and doctors. Reviews were extracted from two Spanish forums: DOS for drug reviews and COPOS for reviews about physicians. SVM classifier were applied with four different word representations (TF-IDF, TF, BTO, and Word2Vec).
Many researchers conducted SA using ML in the medical field. Gräßer et al. [43] examined SA on pharmaceutical review sites. The aim was to predict the overall satisfaction, side effects, and effectiveness of specific drugs. Data were extracted from two web pages, Drugs.com and Druglib.com. The ML classifier considered in the study was logistic regression (LR). Results were found to be promising, and a further extension by applying deep learning (DL) was recommended. Moreover, Daniulaityte et al. [44] applied a supervised ML SA to classify tweets published about drug abuse. Data were extracted from Twitter through Twitter streaming application programming interface (API). Logistic regression (LR), naïve Bayes (NB), and support vector machines (SVM) classifiers were used with 5-fold cross-validation along with F1 measure as an evaluation metric. Another study performed by Basiri et al. [45] analyzed medical and healthcare reviews in which two novel deep fusion frameworks were proposed. The first, (3W1DT), used the DL model as a base classifier (BC) and a traditional ML classifier such as naïve Bayes (NB) or decision tree (DT) as a secondary classifier (SC). The second, 3W3DT, used three DL and one traditional ML model. In both DL models, ML models including NB, DT, RF, and KNN were combined to improve classification performance. Data were extracted from Drugs.com, and proposed models were tested in terms of accuracy and F1-measure.
Other researchers combined lexicon-based with ML techniques. A study by Harrison and Sidey-Gibbons [46] conducted a natural language processing (NLP)-based SA on the following drugs: Levothyroxine, Viagra, Oseltamivir, and Apixaban in three main phases. The first phase applied lexicon-based SA, and the second one used unsupervised ML (latent Dirichlet allocation, LDA) to distinguish reviews written on similar drugs within the dataset. The third phase considered predicting negative or positive reviews on drugs using three supervised ML algorithms: regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). Finally, a comparison analysis was performed between the used classifiers based on classification accuracy, the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity.
However, lexicon-based SA suffers from several drawbacks, especially in the medical field. Firstly, it relies on a predefined list of polarity annotation, which differs based on the used language. Secondly, the language used in reviews is usually informal or slang, which probably is not included in the lexicon. Thirdly, medical terms' and words' meanings may differ depending on the context, which may be misleading. As a result, ML-based SA was found to provide a good alternative [39].
In addition, the aforementioned studies that rely on using ML techniques for medical applications considered sentiments for the reviews of drugs and doctors. They have not considered analyzing sentiments about medical products for COVID-19 crises, which allows decision-makers to take prospective and immediate actions toward improving these products during the crisis.
On the other hand, as stated by Karyotaki et al. [47], a reviewer's positive mood is affected by his attention to any internal or external condition, that is, in our domain, purchasing, reviewing comments, or any nearby customer experience denoted in word of mouth (WOM), while [48,49] proved that emotional intelligence has a significant impact on the research area, which should be taken into consideration, and emphasized the importance of studying both feelings and emotions within a specific domain.
Moreover, extracting new medical product reviews and applying pre-trained word embedding for sentiment analysis is new, since applying pre-trained word embedding for sentiment medical products and using wrapper fs for multiple classifiers at the same time for medical sentiment analysis, where each classifier is used as an evaluation for the wrapper FS, are not present in other studies. In addition, studying the quality of the medical products during COVID-19 is very important in the meantime.
As a result, this paper aims to analyze opinions and feelings expressed through Alibaba online e-commence websites towards medical products, including medical gloves, hand sanitizer, face masks, and whole datasets, which are extensively consumed during the pandemic condition of COVID-19.

Ordinal Regression
Originally, ordinal regression as a term is a statistical regression analysis and might be referred to as ordinal classification. It aims to predict ordinal data, that is, statistical data with natural and categorized variables without knowing distances between categories. Ordinal regression problems relay between classification and regression. It has been used in other fields such as psychology and social sciences. It is worth mentioning here that modeling human preference levels (on a scale from, for example, 1-5 for "very poor" through "excellent") presents an example of ordinal regression [50].
In ML, ordinal regression can be referred to as ranking learning, aiming to obtain patterns classified into naturally ordered labels based on a categorical scale. An ordinal regression classifier performs as a ranking function, in which class labels are interpreted into scores. This process results in obtaining a huge amount of scores and ties. This may be a drawback for ranking techniques such as multipartite ranking. Yet, it can be solved by including thresholds for setting intervals for every single class [51].

Evolutionary Algorithm
Evolutionary algorithms (EA) are classified under biologically inspired algorithms and based on Darwinian evolution theory and usually exploit a set of various synthetic methods such as population management, replication, variability, and selection. They were found to be simple at high-level problems but became complex when knowledge is extracted from the problem domain [52]. They have powerful capabilities in solving machine learning problems, which is a growing area of research nowadays [53].
They are also referred to as metaheuristics, which can be defined as higher-order algorithms that aim to systematically identify the best solution within the problem space. EA algorithms are in this category as well [54]. EA algorithms are executed in order to achieve an already set quantitative goal such as time function. In addition, this goal is a metric of success and an exit method for the algorithm itself, which may be either a single-objective or multi-objective goal. They are known for their effectiveness in finding optimal solutions for optimization problems [55].

Harmony Search Algorithm
HSA is a meta-heuristic optimization algorithm that was firstly proposed in 2001; it is based on designing a unique harmony in music for addressing optimization problems.
The main steps of HSA are clarified by the pseudo-code in Algorithm 1, in which every single possible solution x is necessary x X depending on certain N decision variables, and X represents the space of whole possible solutions. The main target is to find either a minimum or maximum value of fitness function F X i for all i = 1, . . . , N. Pitch adjustment rate (PAR) and bandwidth (BW) are both used to control the exploitation process in HSA, while exploration is the responsibility of harmony memory considering rate (HMCR), as explained in [56].
HSA has five main steps, as explained by [56,57]. First is the initialization of basic HS parameters, HMCR, BW, PAR, number of iterations (NI), and HM size (HMS), as well as determination of the optimization problem goal. Second is the initialization of HM values, as X i should be within upper and lower boundaries using Equation (1), where R denotes a random number between 0 and 1. Third is new harmony improvisation through combining HMCR, PAR, and BW. To do so, let x i,j be the value of the jth decision variable of ith solution; a random number is generated between 0 and 1, and improvisation will be achieved as in Equation (2).
Forth is a fitness function assessment for the newly produced solution; it is checked whether it is superior to the worst solution within HM; then, an updated memory is conducted by replacing the worst with a new solution. Finally, the fifth step is checking the stop criteria set; if it is met, the algorithm will be terminated. if rand < HMCR then 10: Memory consideration 11: if rand < PAR then 12: Pitch adjustment 13: else 14: Random consideration 15: Evaluate the fitness function of the new solution 16: Replace the worst solution in HM by the new solution Its advantages may be summarized as execution clarity, ability to tackle complex problems, and registered success and lower mathematical processes required, as mentioned in [57].

Wrapper Feature Selection
Feature selection (FS) is the process of reducing symmetrical features used to build a model in data mining [58]. In other words, It may be referred to as the process of selecting a subset of the most relevant features to be used for data mining model construction. Mostly, FS is highly recommended and applied for avoiding the curse of dimensionality. There are three types of FS, which are filter, wrapper, and embedded methods [59].
Wrapper FS methods mainly depend on a certain ML algorithm applied to the dataset. It runs a greedy search approach to examine all possible subsets, which obtain the best value of an evaluation criterion. Different performance metrics are meant by evaluation criterion; this is chosen based on the problem domain and has very popular examples such as accuracy, precision, recall, and f1-score [60].

Word Embedding Feature Extraction
Feature extraction (FE) is one of the core steps within data mining projects. It is used to determine features that have a positive effect through classification [61]. Many techniques are used to accomplish this task, such as bag-of-words, term frequency-inverse document frequency (TF-IDF), word embedding (WE), and natural language processing (NLP)-based techniques such as word count, noun count, and word2vec [62].
Word embedding is a term used to refer to word representation for text analysis; other used terms may be word representation and distributed word representation [63,64]. Typically, word meanings are encoded as vectors of real values in which close values in vector space indicate similar words in meaning. Both language modeling and feature learning techniques are required to map words or phrases from vocabulary into real number vectors to obtain WE [65].

Word2vec Embeddings
Word2vec is a technique published in 2013 by a research team in Google led by Tomas Mikolov for NLP over two studies [66,67]. It is an artificial neural network (ANN)based group of models that are used to build word embeddings through extracting word association from text within a corpus. After conducting the training phase, such a model can both detect synonyms and predict additional words for sentences. It represents every single word in form of vectors. A simple mathematical function is used to indicate semantic similarity between vectors. Nevertheless, it was mentioned tehabib2021altibbivec that Word2vec is a shallow and two-layer ANN, which takes text corpus as input and produces a vector space. Word2vec can use either of two main model architectures, namely continuous bag-of-words (CBOW) or continuous skip-gram (SG), as shown in Figure 1.

Methodology
In this section, the methodology processes of the proposed work have been presented. These processes consist of data description and collection, data preparation, and proposed approach, each of which will be discussed in detail in the following subsections.

Data Description and Collection
This work investigates the feedback of Alibaba e-commerce website customers reviews regarding several medical products during the COVID-19 situation. Alibaba is a Chinese multinational e-commerce corporation for retail, internet, and technology. The website was created in 1990 in Hangzhou, specializing in business-to-business (B2B), consumer-toconsumer (C2C), and business-to-consumer (B2C) online sales services. The website also provides different services including cloud computing, electronic payment, and shopping search engines services. In the meantime, Alibaba is considered one of the biggest ecommerce and retailer corporations. Additionally, it is ranked as the fifth-biggest artificial intelligence firm in 2020.
One of the main reasons to select the Alibaba website is due to the availability of medical products as well as the existence of many consumers' reviews for these products compared with other symmetrical websites. In this work, we are aiming to enhance the quality of products through advanced sentiment analysis of the consumer reviews due to its importance and high need from various parties, especially at this time.
The collection process was performed using a crawler tool to gather the reviews of each product on the website. Each review consists of the consumer name, the context of the review, review date, and the rating of the review, either of 1, 2, 3, 4, or 5, according to the person's assessment. It is worth noting that the ratings were crawled as images of 1 to 5 stars.
The collected data consist of medical products such as gloves, hand sanitizers, face masks, and home oxygen concentrators. we choose these products as they were the most used commodities in the last two years.

Data Preparation
The preparation of the data took various phases to be ready for the experimentation step by the models. These pre-processing phases consist of removing missing values and stop words, and cleaning, normalization, and formatting the data. The reviews did not require to be labeled, since they had already been rated by the customer. Nevertheless, to ensure that the labeling was accurate, some experts were asked to read a sample of the data. Therefore, after the preprocessing phases, the feature extraction is performed through several steps. First, we conduct a tokenization process to divide the text into a set of words. In other words, the text is transformed from raw textual data into several features, where each feature represents some kind of statistical measurement of that word. Various text vectorization models have been proposed in the literature. In terms of popularity, term frequency, term frequency-inverse document frequency, and bag-of-words are the most commonly used models. Despite their popularity, however, they do not reveal the actual meaning (semantic) of the words. Therefore, to understand the implicit relationships between words, a more comprehensive approach is needed, namely word embedding.
Embedding is the process of representing words as numerical vectors, in which the words are encoded as dense vectors that create a unique analogous encoding of all words of similar meaning as shown in Figure 2. During the training process, the density vector's components (parameters) are learned and determined. By increasing the embedding dimension, the learning ability is enhanced, although a much larger training dataset is required. In this work, the Word2Vec word embedding model is performed. In Word2Vec, three layers are presented, an input layer, a hidden layer, and an output layer. However, this neural network architecture merely learns the weights of the hidden layer that are the embeddings of words.
Since Word2Vec requires huge datasets to operate, a pre-trained word embedding method is required to overcome the lack of such data. Therefore, we used a pre-trained word embedding corpus of 10 million instances [68]. It is worth mentioning that the number of dimensions for the learning data is 400; therefore, each collected dataset will have the same number. Five different datasets have been prepared, namely medical gloves, hand sanitizer, medical oxygen, and face masks, while the fifth dataset is composed of merging all four datasets. The details of the datasets can be found in Table 1.

Proposed Approach
In this subsection, we explain the proposed approach based on harmony search algorithm and different classification models to obtain improved and competitive results using fewer features. The best subset of features is selected by using wrapper feature selection, where each classification model evaluates the output features.
In this study, the produced solution (feature subset) was designed as a set of binary vectors with length n, where n denotes the number of features in each dataset. The feature is considered selected if the corresponding value equals 1 and zero if not. The classification model evaluation (fitness) was used to determine the quality of a feature subset according to the following Equation (3): where the number of selected features denotes by |S|, while |W| presents the number of the total features in a given dataset, and α ∈ [0, 1]. Therefore, after completing the preparation of the data, we divided the data into training and testing sets. The splitting criteria used in this work was the 10-fold cross-validation. To be more precise, the dataset is divided into nine training partitions, whereas the remaining partition is assigned for testing. Meanwhile, the number of features are listed in the HSA algorithm, thus identifying the total number of values in each solution. The HSA algorithm generates a solution and provides the classification models with a subset of features for the training phase.
Afterward, through an internal 5-fold cross-validation, the classifiers start training and building their models from the given training set. The purpose of performing this extra step is to improve training and build a more robust classifier model. To improve the next solution by HSA, the evaluation of the training set will be used as a fitness value in the HSA algorithm. A termination criterion is triggered when the HSA algorithm reaches its maximum number of iterations. In case of failure to meet this termination criterion, the fitness value will continue to be provided for the HSA algorithm. When the HSA reaches the maximum number of iterations, the best subset of features will be determined as the optimal solution. Based on the best subset of features obtained from the training phase, the final result of the classification accuracy and root mean square error (RMSE) for the testing set will be calculated using these subsets. Finally, all experiments were repeated 30 times independently to obtain reliable results, and the average was reported. The methodology is depicted in Figure 3 along with all the proposed steps.

Experiment and Results
This section describes the findings from tests involving the HSA algorithm and other classification methods, both with and without feature selection. It also discusses the chosen assessment measures and the experiment's surrounding settings and conditions.

Experimental Setup
The experiments in this article were performed on a laptop running Windows 10 on an Intel Core i7 processor running at 2.40 GHz with 16 GB of RAM. Additionally, we ran our tests in the Matlab R2015a (8.5.0.197613) software environment. Further, the parameters of the classification models and HSA algorithm can be found in Table 2. Tree size 100

Evaluation Measures
The evaluation measures considered in this study include the accuracy and RMSE for measuring the classification performance.
The accuracy reflects the quality of the classification task. It is defined as the ratio of the true positives (TP) and true negatives (TN) to the number of all predicted values including the TP, TN, false positives (FP), and true negatives (TN), as reflected by Equation (4) [69][70][71][72][73][74].
On the other hand, the RMSE measures the difference between each predicted output and the corresponding expected value by calculating the root of the ratio of the difference and the number of instances, as reflected by Equation (5).
where n is the number of instances, R i is the real value, P i is the predicted value, and P i is the mean of the predicted values.

Comparison Experiments without Feature Selection
In this section, word embedding is experimented without any feature selection technique using the RF, K-NN, AdaBoost, bagging, SVM, and REPtre techniques. The experiments are made on different datasets, including the medical gloves, hand sanitizer, face masks, and whole datasets. Table 3 shows the accuracy values achieved by conducting the results for the different datasets. In general, and according to the table, the Rf technique has the best results for the medical gloves, medical oxygen, and whole datasets. Looking deeply into the table, it shows that the RF technique has almost similar results to the bagging and REPtree techniques for all the datasets, and thus, the bagging and REPtree techniques slightly outperform the RF for the face masks and the hand sanitizer datasets, respectively. On the other hand, the AdaBoost technique also has similar results to RF for the hand sanitizer and medical oxygen datasets. Moreover, K-NN has a close accuracy value to RF for the whole datasets but is not competitive to Rf for the remaining datasets. SVM, on the other hand, has a non-competitive behavior to RF and almost all the other techniques. It also has a very low accuracy value for the face masks compared to the others. In summary, RF, REPtree, and bagging tend to have consistent behavior across all datasets, having better results than the other techniques. However, RF is recommended, since it has better results than bagging and REPtree for the majority of the datasets, although it shows slight enhancement. Table 4 shows the results obtained by RMSE for the same datasets and techniques. Similar observations can be made for RF, bagging, and RFPtree techniques. However, k-NN shows a slight enhancement for the face masks dataset. SVM also shows the worst results among all the datasets.

Evolutionary Algorithm Feature Selection
This section discusses the effect of adding the feature selection process using the HSA technique for different classification techniques including RF, k-NN, AdaBoost, bagging, SVM, and REPtree for the five datasets in the study. Table 5 shows the effect of the HSA on reducing the dimensions of the datasets. We can observe that the medical glove, hand sanitizer, medical oxygen, face masks, and whole datasets have reduced dimensions of 23,42,43,30, and 63, respectively. By comparing the values to the ones in Table 1, where every dataset has a dimension value of 400, it is observed that the dimension value has been highly reduced for all the datasets. Specifically, the number of features is reduced by 94.25%, 89.5%, 89.25%, 92.5%, and 84.25% for the medical glove, hand sanitizer, medical oxygen, face masks, and whole datasets, respectively. According to the reduced dimensions of the different datasets, Tables 6 and 7 show the obtained results of the accuracy and RMSE values, respectively, from experimenting with the same aforementioned classification techniques.
On one hand, Table 6 shows the accuracy values for all the datasets. It can also be concluded that the RF technique performed better than the other techniques. The same observation can be made for the bagging and the REPtree techniques, having close results to the RF technique. However, different observations can be made for the k-NN technique. This includes the enhanced accuracy values for the k-NN in all the datasets compared to the performance achieved by k-NN without considering the feature selection process, which can be recalled from Table 3. It also includes an identical performance for the K-NN compared to the RF for the medical oxygen dataset and almost similar results compared to RF for all the datasets. In addition, AdaBoost made the same observation for the hand sanitizer and medical oxygen datasets, having very close results to those for the RF technique. SVM still achieves the worst results among all other techniques.
On the other hand, Table 7 shows the RMSE values for all the datasets. Different observations can be made from this table, by the enhanced results observed by k-NN, having better performance than the other techniques, including RF, for medical gloves, medical oxygen, and face masks datasets. Bagging slightly achieved better results for the hand sanitizer dataset. In general, , REPtree, and k-NN have similar results, as observed from Table 6. In contrast, SVM shows the worst results among the other techniques and a relatively very high RMSE value for the face masks dataset.
To summarize, the process of reducing the number of features for the dataset does not only allow for low processing time achieved by the classification techniques but also enhances the performance of some low-performance techniques including the k-NN and keeping a competitive performance for the other classification techniques.

Discussion
The proposed approach consists of using word embedding as text representation and a harmony search algorithm as a feature selection method utilizing different classification models as an evaluation to improve the sentiment analysis prediction of medical products.
The implementation of word embedding as a text representation can benefit by selecting words with symmetrical meanings to have similar representations. Thus, reducing the number of presenting words such as the methods in Figure 4. We can notice from the table that some datasets reached nearly 7500 features, where most of these features are useless, unlike the word embedding method. Therefore, using word embedding can be more efficient, especially using the pre-trained word embedding technique through mapping our data with a learning corpus.  Moreover, the results of the proposed approach in the previous subsection highlight the improvement of using the HSA for feature selection. Such a mechanism showed excellent and competitive results using a fewer number of features, consequently decreasing the computational time of the experimental results. The reduction rates on the features are shown in Figure 5. It can be seen that the percentage of reduction reached about 90% on every dataset.  Figure 5. Features reduction rates.

Conclusions
This study proposes an intelligent hybrid technique using a harmony search algorithm for feature selection and different classifiers for evaluation. This paper studies customers' reviews regarding COVID-19 medical products on the Alibaba e-commerce website. Five different datasets have been prepared, namely medical gloves, hand sanitizer, medical oxygen, and face masks, while the fifth dataset is composed of merging all four datasets. After data collection and preprocessing, an advance pre-trained word embedding technique was applied for the tokenization process; the Word2Vec embedding technique was chosen as the main feature extraction model rather than the regular techniques to overcome the challenges of the data. This work follows a hybrid evolutionary approach, as it is based on a metaheuristic optimization algorithm, namely a harmony search algorithm, which is used for the feature selection process to select the best subset of features using different classifiers for evaluation. Six different classification techniques including RF, k-NN, AdaBoost, bagging, SVM, and REPtree are examined for classifying the sentiment dataset. The results include the accuracy and RMSE values of the classification process with the feature selection process and without it. The results show that applying the feature selection process reduced the number of features by 94.25%, 89.5%, 89.25%, 92.5%, and 84.25% for the medical glove, hand sanitizer, medical oxygen, face masks, and whole datasets, respectively. Competitive performance for the classification techniques and a decrease of the computational time of the classification process due to reducing the number of features are achieved by improving the evaluation measures' values of some low-performance techniques.