Classification of the Central Effects of Transcutaneous Electroacupuncture Stimulation (TEAS) at Different Frequencies: A Deep Learning Approach Using Wavelet Packet Decomposition with an Entropy Estimator

The field of signal processing using machine and deep learning algorithms has undergone significant growth in the last few years, with a wide scope of practical applications for electroen‐ cephalography (EEG). Transcutaneous electroacupuncture stimulation (TEAS) is a well‐established variant of the traditional method of acupuncture that is also receiving increasing research attention. This paper presents the results of using deep learning algorithms on EEG data to investigate the effects on the brain of different frequencies of TEAS when applied to the hands in 66 participants, before, during and immediately after 20 min of stimulation. Wavelet packet decomposition (WPD) and a hybrid Convolutional Neural Network Long Short‐Term Memory (CNN‐LSTM) model were used to examine the central effects of this peripheral stimulation. The classification results were analysed using confusion matrices, with kappa as a metric. Contrary to expectation, the greatest dif‐ ferences in EEG from baseline occurred during TEAS at 80 pulses per second (pps) or in the ‘sham’ (160 pps, zero amplitude), while the smallest differences occurred during 2.5 or 10 pps stimulation (mean kappa 0.414). The mean and CV for kappa were considerably higher for the CNN‐LSTM than for the Multilayer Perceptron Neural Network (MLP‐NN) model. As far as we are aware, from the published literature, no prior artificial intelligence (AI) research appears to have been conducted into the effects on EEG of different frequencies of electroacupuncture‐type stimulation (whether EA or TEAS). This ground‐breaking study thus offers a significant contribution to the literature. However, as with all (unsupervised) DL methods, a particular challenge is that the results are not easy to inter‐ pret, due to the complexity of the algorithms and the lack of a clear understanding of the underlying mechanisms. There is therefore scope for further research that explores the effects of the frequency of TEAS on EEG using AI methods, with the most obvious place to start being a hybrid CNN‐LSTM model. This would allow for better extraction of information to understand the central effects of peripheral stimulation.


Introduction
'Signals due to rhythmic stimulation... appear to reach parts of the central nervous system which are inaccessible to impulses set up by non-rhythmic stimuli, however intense' (William Grey Walter) [1].
Transcutaneous electroacupuncture stimulation (TEAS), a non-invasive variant of the ancient method of acupuncture that has been used since the 1990s. It is increasingly used in clinical practice, most commonly for pain management and in a range of musculoskeletal presentations, predominantly in China [2]. TEAS has also been shown, for example, to be effective in the treatment of stroke [3], post-operative nausea and vomiting [4], and for improving symptoms of insomnia and anxiety in opioid use disorder [5].
In contrast to classical acupuncture, TEAS does not involve any puncture of the skin or the use of any needles. It is therefore potentially advantageous for any patient who has a fear of needles (so-called needle phobia) or for whom skin puncture might be considered an unacceptable clinical risk. Ulett et al. (1998) [6] identified that the effects of classic electroacupuncture (via needles) are stronger and more profound than those achieved with manual acupuncture (employing a needle but with no electrical stimulation). Furthermore, electroacupuncture with surface electrodes was demonstrated to be as effective as needlebased electroacupuncture.
In functional magnetic resonance imaging (fMRI) studies, electroacupuncture has been shown to generate more widespread cerebral and sub-cortical changes than manual acupuncture [7,8], and thus, the use of TEAS may have real physiological advantages over classical manual acupuncture.
The safety concerns associated with acupuncture [9], although largely theoretical, can be ameliorated through the use of a surface electrode stimulating system (i.e., TEAS).
The potential for home-based, patient-delivered acupuncture may have significant advantages (reduced cost, and a lower clinical burden for both the patient and the clinician). TEAS makes a home-based delivery system a realistic proposition [10].
Based on a series of several small pilot studies conducted between 2011 and 2015, in 2016-2017, a larger study was conducted (N = 66) with the same primary objective, namely, to ascertain if electroacupuncture stimulation-whether applied using needles or transcutaneously-has frequency-specific effects on electroencephalography (EEG) and other physiological signals. In the current study, we also expect to see differences in the effects depending on participant age, gender, personality and mood, as well as in the subjectively reported intensity of stimulation. Our objective in this first neuroimaging report is to determine if there is a difference between the EEG at baseline (before stimulation), during transcutaneous electroacupuncture stimulation (TEAS) and after stimulation, and whether these differences vary with stimulation frequency.
To try to answer our research question, we decided to use deep learning (DL), a twenty-first-century method of data analysis that has evolved from machine learning (ML). The application of both of these artificial intelligence (AI) methods of data analysis has been increasing exponentially over the past decade, whereas research on acupuncture-related stimulation methods has grown steadily and more or less linearly over the same period ( Figure 1).

A Brief Overview of AI: Machine Learning (ML) and Deep Learning (DL) in EEG Analysis
As an experiment in learning, a series of nested literature reviews were conducted on or around the 5 December 2021. In the first of these, 2118 papers were located on PubMed.gov by using the search string 'EEG AND ("machine learning" OR "deep learning")'. Of the 2118 hits, almost a quarter (473, or 22.3%) included the terms 'epilepsy OR seizure*'. 138 (6.5%) of this subset of papers were reviewed, and 47 of these (34.1%, or more than a third) were on epilepsy or seizures.

A Brief Overview of AI: Machine Learning (ML) and Deep Learning (DL) in EEG Analysis
As an experiment in learning, a series of nested literature reviews were conducted on or around the 5 December 2021. In the first of these, 2118 papers were located on Pub-Med.gov by using the search string 'EEG AND ("machine learning" OR "deep learning")'. Of the 2118 hits, almost a quarter (473, or 22.3%) included the terms 'epilepsy OR seizure*'. 138 (6.5%) of this subset of papers were reviewed, and 47 of these (34.1%, or more than a third) were on epilepsy or seizures.
In contrast, only one of the 2118 papers found was on acupuncture [11], and none were on transcutaneous electrical nerve stimulation (TENS), although one was located that mentioned transcutaneous vagal stimulation [12]. Widening the search strategy to locate acupuncture or TENS studies using ML or DL methods, but not necessarily applying them primarily to EEG, a further useful review paper on acupuncture, ML and neuroimaging (including EEG) was located on PubMed [13]. Only one paper on electroacupuncture (EA) and ML was located [14], but ML was used here as a method of predicting clinical outcomes, not in the analysis of physiological signals. For bibliometric comparison, comparable searches were also made using SCOPUS, Elsevier's citation database [15] and the resources of CNKI (China National Knowledge Infrastructure, 中 国 知 网 ) (https://cnki.net/) (accessed on 17 February 2023), although the results of the latter appeared somewhat variable, depending on when the searches were conducted.
Based on the literature located using PubMed and other online sources, a brief overview of ML and DL methods used for EEG data analysis is provided in the online Supplementary Materials, Section SM1. It is not exhaustive and is intended simply to provide enough background information for those unfamiliar with the language of AI to understand the methods and results of our analysis.

Literature Review and Resulting Proposed Strategy
Combinations and comparisons of ML and DL algorithms were located in PubMedindexed papers using the search terms 'EEG AND [  In contrast, only one of the 2118 papers found was on acupuncture [11], and none were on transcutaneous electrical nerve stimulation (TENS), although one was located that mentioned transcutaneous vagal stimulation [12]. Widening the search strategy to locate acupuncture or TENS studies using ML or DL methods, but not necessarily applying them primarily to EEG, a further useful review paper on acupuncture, ML and neuroimaging (including EEG) was located on PubMed [13]. Only one paper on electroacupuncture (EA) and ML was located [14], but ML was used here as a method of predicting clinical outcomes, not in the analysis of physiological signals. For bibliometric comparison, comparable searches were also made using SCOPUS, Elsevier's citation database [15] and the resources of CNKI (China National Knowledge Infrastructure, 中国知网) (https://cnki.net/) (accessed on 17 February 2023), although the results of the latter appeared somewhat variable, depending on when the searches were conducted.
Based on the literature located using PubMed and other online sources, a brief overview of ML and DL methods used for EEG data analysis is provided in the online Supplementary Materials, Section SM1. It is not exhaustive and is intended simply to provide enough background information for those unfamiliar with the language of AI to understand the methods and results of our analysis.

Literature Review and Resulting Proposed Strategy
Combinations and comparisons of ML and DL algorithms were located in PubMedindexed papers using the search terms 'EEG AND  SVM  6  1  3  2  RF  5  1  3  1  LDA  1  0  1  0  LR  0  0  0  0  Clustering  1  n/a  n/a  1  PCA  0  0  0  0  Sum  14  2  7  4   DNN   SVM  6  4  2  0  RF  2  1  1  0  LDA  2  1  1  0  LR  0  0  0  0  Clustering  2  2  0  0  PCA  0  0  0  0  Sum  11  8  4  0  SVM  56  19  35  2  RF  31  7  20  4  LDA  13  2  11  0  LR  5  1  3  1  Clustering  15  5  1  9  PCA  3  1  2  0 Note: Clustering (or obvious synonyms) did not appear in some papers located in PubMed when using the search term 'cluster*,' while in some papers that did include clustering, it did not appear to be used as an ML method. For Logistic Regression (LR), one study did not provide results for either the combination or comparison of methods [16] NB: Not all (non-feature-based) DL methods outperformed (feature-based) ML methods (see some of the Support Vector Machine vs. Convolutional Neural Network (SVM vs. CNN) studies, for example). Table 2 shows the results of the PubMed searches for combinations or comparisons of the two DL algorithms, carried out on 5 December 2021. CNN-LSTM models are thus relatively common hybrids and are more common than LSTM-CNN, the inverse combination. CNN-RNN and LSTM-RNN are the next most common hybrid models. This was confirmed for DL algorithms in general (i.e., not restricted to EEG studies), using Google Scholar instead of PubMed (the abbreviations are defined in the caption of Table 1, and explained in the online Supplementary Materials, SM1). CNN-LSTM thus appears to be an appropriate hybrid model for the current study. For those unfamiliar with CNN and LSTM, a description is provided in the online Supplementary Materials (SM2.1 and SM2.2).

Literature Review of AI, Acupuncture and EEG
A brief review was conducted of studies indexed in three major online databases-PubMed, SCOPUS and CNKI-on machine learning (ML), Support Vector Machines (SVM), deep learning (DL) or Convolutional Neural Networks (CNN) and electroencephalography (EEG), acupuncture (Acup) or electroacupuncture (EA). SVM and CNN were selected as frequently used exemplars of ML and DL, respectively. The results are shown in Table 3. The percentage of all ML studies that mention SVM was thus lowest in PubMed (30.57%) and highest in SCOPUS (46.10%). Similarly, the percentage of all DL studies that mention CNN was lowest in PubMed (53.35%) and highest in SCOPUS (62.22%). In both PubMed and SCOPUS, a greater percentage of EEG studies mentioned SVM than CNN, but this was reversed in CNKI, suggesting geographical bias. The effects of EA, TEAS (and, indeed, acupuncture) are usually explained using neurochemical models, in which different regions of the brain play key roles [17,18]. In particular, stimulation at low, medium and high frequencies, or low or high amplitudes, may activate different pathways in the spinal cord and brain [18]. In brief, low-frequency stimulation (at 2-4 Hz) activates both large-and small-diameter afferents, and thus, has both segmental and supraspinal effects, with the release of enkephalin and beta-endorphin in the brain (less so in the spinal cord). These central effects may mean that any resulting analgesia has a slow onset and outlasts the stimulation itself. In contrast, high-frequency stimulation (at 50-200 Hz) activates predominantly the large-diameter afferents, so that the effects are segmental (associated with the release of dynorphin in the spinal cord), not supraspinal. Analgesia thus has a rapid onset but does not last long. Stimulation frequencies of 10-20 Hz may activate both mechanisms [19]. Numerous functional magnetic imaging (fMRI) studies have been conducted to explore these connections, but far less research has investigated the effects of acupuncture, EA or TEAS on EEG (315, 69 and 2 studies are currently indexed in PubMed, respectively), despite the advantages of EEG over other forms of neuroimaging, such as fMRI and magnetoencephalography (MEG) in terms of cost, portability and/or temporal resolution and usefulness for frequency analysis.

EEG Studies on Acupuncture and Related Modalities
Using the search string 'EEG AND (acupuncture OR transcutaneous) NOT ("vagal stimulation" OR "vagus nerve stimulation"),' about 500 studies were identified in PubMed. Of these, around 147, published between 1986 and 2022, were easily retrieved and could be examined in depth. Here, we consider those on steady-state EEG rather than evoked potentials. Of these, 56 (around 38%) were from China, 15 from Korea, 13 from the US, 11 from Japan and 10 from Taiwan. Other countries were represented by fewer than 10 studies each. Of the Chinese studies, 25 (more than 44%), or almost half, were from Hebei University of Technology (Tianjin University).
Most of these EEG studies were on manual acupuncture, and it should be remembered that 'TEAS is different from insertive electroacupuncture in many ways, and the results from these studies may not apply to acupuncture' [20]. In the acupuncture-related studies, the points most commonly used-as noted in a 2018 systematic review of 19 EEG acupuncture studies [21]-were ST36 (zusanli, 足三里, Zúsānlĭ), on the leg below the knee and lateral to the anterior crest of the tibia; LI4 (hegu, 合谷, Hégŭ), located in the area covered by the superficial branch of the radial nerve, and close to the radial artery or first dorsal metacarpal artery, i.e., on the back of the hand between the first and second metacarpals; and P6 (neiguan, 内关, Nèiguān), on the anterior surface of the forearm, proximal to the wrist crease between the palmaris longus and flexor carpi radialis tendons.
Methods of analysing changes in EEG were varied. Measures based on EEG power occurred in similar numbers of studies published before and after 2013, the median year of publication for the 147 studies located, as did nonlinear entropy and complexity measures. Only one study on cordance was located. Functional connectivity measures based on a graph or network theory-i.e., quantifying relationships between EEG at different electrodes [22]-were found in only one study before 2013, but in 13 of the 72 studies published since then.
Of the Tianjin studies located, 14 were on manual acupuncture, 10 on non-invasive magnetic stimulation (TMS-transcranial magnetic stimulation), one on moxibustion and one on 100 Hz microcurrent TEAS. Half the Tianjin acupuncture studies, published be-tween 2010 and 2021, investigated the effects of different frequencies of needle-twirling at ST36 in EEG. Participants were lying down with their eyes closed in a darkened room. Three different frequencies were used in the same session, with between 4 and 10 min rests between them, depending on the study. In contrast, only one upper limb TENS study, a BSc thesis from Holland, investigated the effects of stimulation frequency on EEG and did not use low-frequency stimulation.
As yet, there are only two studies in PubMed on artificial intelligence (whether machine learning or deep learning), EEG and acupuncture or transcutaneous stimulation (TENS or TEAS), with one of these being from Tianjin, as mentioned above [11]. Seven studies were also located in a separate PubMed search for 'acupuncture' AND 'wavelet' AND 'EEG'. Of these, four were on wavelet packet decomposition (WPD). Three of them were from Tianjin [23][24][25], and one from Shenyang [26]. A further study investigating the effects of 20-100 Hz transcutaneous brachial stimulation on EEG wavelet entropy was noted [27], but it did not involve AI or explore the effects of different stimulation frequencies. Thus, no prior AI research appears to have been conducted on the effects of different frequencies of stimulation (whether EA or TEAS) on EEG.

The Experiment
Ethics approval for the study was granted by the Health and Human Sciences Ethics Committee with the Delegated Authority of the University of Hertfordshire (UH)-Protocol number HSK/SF/UH/00124.
Sixty-six participants were recruited as a convenience sample from among healthy staff and students at the University, local complementary health practitioners and other contacts. After the completion of some online questionnaires and an explanation of the procedures to be followed, participants attended their first session. They attended four sessions in all, a week or more apart (except for four participants who dropped out after only one session, and another who only completed three sessions). All sessions were conducted in the University Physiotherapy Laboratory, although despite our best efforts, this could not be completely soundproofed, or temperature controlled.
Participants were seated upright in a comfortable chair, with both forearms supported. Informed consent was obtained, further paper questionnaires were completed, and the participants were then prepared for the session. This preparation, which took around 15 min, involved fitting an EEG cap with head movement sensors attached, and affixing electrocardiogram (ECG) electrodes to the forearms, as well as other sensors to the fingers of both hands. The EEG cap, ECG electrodes and other sensors were worn for the remainder of the session (usually around 60 to 90 min).
Following an initial 5 min Baseline recording ('time slot 1'), TEAS was applied for 20 min to each hand, with a short pause halfway through to allow for further questionnaires to be completed and participants to rest briefly ( Figure 2). Otherwise, during the whole procedure, participants were asked to keep their eyes open and focus gently on an object in front of them (to reduce eye movement artefacts). EEG recording continued during stimulation (Stim1-Stim4, or 'time slots 2-5'), which was between the acupuncture point LI4 (hegu), located on the dorsum of the hand between the 1st and 2nd metacarpal bones, and the ulnar border of the same hand. In other words, the current only passed between the electrodes on each hand, and did not flow through the arms and torso, so that, in principle, it should not affect the heart-or brain-directly.
After stimulation (and the completion of other questionnaires), the recording was continued for a further 15 min (Post1-Post3, or 'slots 6-8') to assess post-stimulation changes ( Figure 2). The electrodes and sensors were then removed, and further questionnaires were filled out before the participant left. After the laboratory experiment, 47 of the 66 participants completed further online questionnaires on several personality traits. procedure, participants were asked to keep their eyes open and focus gently on an object in front of them (to reduce eye movement artefacts). EEG recording continued during stimulation (Stim1-Stim4, or 'time slots 2-5′), which was between the acupuncture point LI4 (hegu), located on the dorsum of the hand between the 1st and 2nd metacarpal bones, and the ulnar border of the same hand. In other words, the current only passed between the electrodes on each hand, and did not flow through the arms and torso, so that, in principle, it should not affect the heart-or brain-directly. After stimulation (and the completion of other questionnaires), the recording was continued for a further 15 min (Post1-Post3, or 'slots 6-8′) to assess post-stimulation changes ( Figure 2). The electrodes and sensors were then removed, and further questionnaires were filled out before the participant left. After the laboratory experiment, 47 of the 66 participants completed further online questionnaires on several personality traits.
A charge-balanced Equinox E-T388 stimulator was used in all four sessions ( Figure  3), and in each session, was set at one of four different frequencies-2.5 alternating monophasic pulses per second (pps), 10 pps, 80 pps or 160 pps (the frequency or number of cycles of stimulation per second, in Hertz, was at half these values). For the three lower frequencies, the output amplitude was set to provide a 'strong but comfortable' sensation for the participant. In contrast, 160 pps was applied as a 'sham' treatment, with the device switched on (and a flashing light visible), but the output amplitude remaining at zero throughout-although a pretence was made of turning up the amplitude, out of sight of the participants. Nonetheless, the stimulation (at 80 and 160 pps) was visible as an interference pattern (envelope) on one of the screens, showing the recorded ECG (although hidden from participants' view so as not to distract them), and some participants were aware of a sensation in their hands at some moments during their sham session. The different stimulation frequencies for each participant were applied in a semi-randomised balanced order. A charge-balanced Equinox E-T388 stimulator was used in all four sessions (Figure 3), and in each session, was set at one of four different frequencies-2.5 alternating monophasic pulses per second (pps), 10 pps, 80 pps or 160 pps (the frequency or number of cycles of stimulation per second, in Hertz, was at half these values). For the three lower frequencies, the output amplitude was set to provide a 'strong but comfortable' sensation for the participant. In contrast, 160 pps was applied as a 'sham' treatment, with the device switched on (and a flashing light visible), but the output amplitude remaining at zero throughoutalthough a pretence was made of turning up the amplitude, out of sight of the participants. Nonetheless, the stimulation (at 80 and 160 pps) was visible as an interference pattern (envelope) on one of the screens, showing the recorded ECG (although hidden from participants' view so as not to distract them), and some participants were aware of a sensation in their hands at some moments during their sham session. The different stimulation frequencies for each participant were applied in a semi-randomised balanced order.

Data Collection
EEG data were collected for forty minutes (8 × 5-min 'slots') in each session. The 10/20 system of electrode location was used (19 electrodes with linked ears as reference and ground anterior to Fz). The data collection followed standard EEG procedures using Electro Cap International (ECI) caps (size selected individually for maximum comfort, according to participants' head dimensions), a Mitsar EEG-202 amplifier and WinEEG software (v2.91.54) (Mitsar Ltd., St Petersburg, Russia). The sampling rate was 500 Hz.

Data Analysis
Data were recorded initially in WinEEG ('.eeg') format (one file per session) and saved in '.edf' format, and then, each separate session file was cut into eight separate '.mat'

Data Collection
EEG data were collected for forty minutes (8 × 5-min 'slots') in each session. The 10/20 system of electrode location was used (19 electrodes with linked ears as reference and ground anterior to Fz). The data collection followed standard EEG procedures using Electro Cap International (ECI) caps (size selected individually for maximum comfort, according to participants' head dimensions), a Mitsar EEG-202 amplifier and WinEEG software (v2.91.54) (Mitsar Ltd., St. Petersburg, Russia). The sampling rate was 500 Hz.

Data Analysis 2.3.1. Data Pre-Processing
Data were recorded initially in WinEEG ('.eeg') format (one file per session) and saved in '.edf' format, and then, each separate session file was cut into eight separate '.mat' files (one for each 5 min 'slot').
Data were first filtered between 0.5 and 45 Hz using Matlab 2nd order Butterworth filters ('butter' and 'filtfilt' functions). Mains power in the UK is supplied at 50 Hz, so for higher-frequency signals, a second-order 50 Hz Butterworth notch filter was also used (49)(50)(51). Independent Component Analysis (ICA) was then conducted using the extended Infomax algorithm [28] to reject non-neural artifacts, together with the multiple artifact rejection algorithm (MARA) [29]. Components were labelled and artefactual components removed in conjunction with ICLabel [30], an EEGLab plug-in [31]. Subsequently, the Trimoutlier EEGLab plug-in [32] was used to remove epochs exceeding an individual amplitude threshold defined as +/− 3 SD (standard deviation) above the mean amplitude across all channels. Finally, data were re-montaged with the CSD Toolbox (ExtractMontage.m function) [33], using the Laplacian form of local average reference.
At this stage, data from several participants were excluded, either because of inadvertent differences in sampling rate (for four participants, in four sessions), missing or poorquality data or because recordings were cut short for one reason or another (e.g., discomfort from wearing the cap or having to take a trip to the bathroom). Files for 48 participants remained-192 for each 'slot', in .mat format-resulting in 1536 files in total. Figure 4 shows the data collection and pre-processing pipeline.

AI Analysis
The analysis was divided into two parts.
In Part A, wavelet packet decomposition (WPD), i.e., frequency decomposition of the wavelet packet transform (WPT) [34], was used to break down the EEG signal into eight (non-standard) bands, and the wavelet entropy features were extracted.

AI Analysis
The analysis was divided into two parts. In Part A, wavelet packet decomposition (WPD), i.e., frequency decomposition of the wavelet packet transform (WPT) [34], was used to break down the EEG signal into eight (non-standard) bands, and the wavelet entropy features were extracted.
In Part B, the analysis was more exploratory. DL algorithms-CNN-LSTM in Phases 1, 2 and 4, and MLP-NN in Phase 3-were applied with the features from Part A as inputs. The algorithms used were determined by the objective for each phase: to explore changes over time (either baseline to stimulation slots, or baseline/stimulation/post-stimulation) or differences among the standard EEG bands (delta, theta, alpha, beta, and gamma).
In general terms, the study used the Python-based 'TensorFlow' framework, which is popular and widely used for the training and inference of Deep Neural Networks. This framework provides an optimised environment for executing large-scale computations and can handle the complexities of training DL models.
The choice of hardware is dependent on the computational requirements of the deep learning models being studied and the resources available. Here, we used either a highperformance GPU (Graphical Processing Unit) or a TPU (Tensor Processing Unit). GPUs have been shown to significantly speed up the training time of deep learning algorithms compared to traditional CPUs (Central Processing Units). TPUs are custom-built by Google for deep learning and provide even faster training times compared to GPUs.

Methodology: Part A
Fast Fourier transform (FFT) is traditionally used to convert a time-domain signal (time series)-such as EEG-into a frequency domain signal. However, as is well known, FFT is not suitable for the analysis of non-stationary, non-Gaussian and nonlinear signals such as EEG. WPD, although less frequently used in ML EEG studies, is one of several methods that are more appropriate for such data [35], and is capable of dealing with both low-and high-frequency signal components (unlike the wavelet transform itself, which has good temporal resolution but poor frequency resolution at high frequencies, and good frequency but poor time resolution at low frequencies [36]). Using WPD, the EEG signal is decomposed into a pre-selected number of frequency bands using multiple mirror filters in a binary tree structure. Different algorithms are possible when implementing WPD. In all four phases of our analysis, we chose to use a cost function based on 'energy entropy' to quantify the error between predicted values and expected values in the algorithm.
Wavelet packet transform was applied for each group of data, adopting a MATLAB (Matrix Laboratory) multisignal WPD feature extraction code developed by Khushaba et al. [37][38][39]. Since the acquired EEG signals were sampled at 500 Hz, the number of samples selected per window to extract features was 500, and the spacing of the windows (the increments between them) was chosen to be 25. The decomposition level used was 7 [40]. For a full tree topology at this level, for each of the 19 EEG electrodes, 255 features were extracted in total using a high-low bandpass filter bank [41], resulting in a WPT feature matrix (n, (255 × 19)). Wavelet entropies were then evaluated from the WPT feature matrix to reduce the size of the data and obtain a strong biomarker for classification. At this stage, the feature matrix was separated into fractions, with 255 samples for each of the 19 electrodes. The size of each of the 19 resulting submatrices was thus [n,255]. The parameter-free wavelet entropy code used here is based on that developed by Rosso et al. [42]. The wavelet filter selected was 'Coiflet-4', and the decomposition level was 7, based on results from our previous study on discrete wavelet transform (DWT), in which we found that Coiflet performance was better when compared to that of other wavelet families [43]. When the wavelet entropy algorithm was applied to these matrices, (255*19)sized feature matrices, containing the entropy values, were acquired for every two classes. Finally, these feature matrices were fed into a 1D CNN-LSTM hybrid classifier to generate the classification model (Phases 1, 2 and 4).
In our EEG study, four different frequencies of peripheral electrical stimulation were used: a 'sham' at 160 Hz but a very low amplitude ('0'), and 2.5 Hz, 10 Hz and 80 Hz at above the sensory threshold. For the wavelet packet decomposition of signals for the four types of stimulation frequency (SF), 100 non-overlapping samples were extracted from the EEG data for each type of SF, with each sample containing 5000 data points (recordings for each class were cropped and data sizes balanced). One sample for each type of SF was processed with WPD, using the Daubechies 2 (DB2) wavelet basis function [44]. Each sample was decomposed into eight frequency bands (FB), from FB1 to FB8. Finally, these feature matrices were fed into a Multilayer Perceptron Neural Network (MLP-NN) classifier to generate the classification model in Phase 3.

Methodology: Part B
We divided our exploratory DL analysis into four phases, each designed with a particular objective in mind:

1.
To determine whether EEG during TEAS differed from EEG at baseline, and whether such differences were dependent on stimulation frequency; 2.
To investigate how EEG differed before, during and following TEAS, and whether such differences were dependent on stimulation frequency; 3.
To determine whether differences among the EEG bands vary with both time (baseline, stimulation or post-stimulation) and stimulation frequency; 4.
The objective here was the same as in Phase 3, but with the addition of comparing how the MLP-NN and CNN-LSTM models performed.
Methodology: Part B, Phase 1 For our analyses, here, we only processed data recorded at 500 Hz, although this did include some quite noisy recordings, resulting in 1806 or 1807 files for each filtered EEG band (delta to gamma). In this phase, the data from all five EEG bands were considered together rather than separately. The data were then standardised using the StandardScaler() command in the Python scikit-learn library [45]. Our objective was to determine whether EEG during TEAS differed from EEG at baseline, and whether such differences were dependent on stimulation frequency.
The initial analysis was conducted using a Sequential Application Programming Interface (API) in the 'Keras' DL framework [46], and a hybrid CNN-LSTM model similar to that illustrated in Figure S5 in the online Supplementary Materials, as shown in Table 4.
The outputs were evaluated using standard confusion matrix metrics, of which sensitivity and specificity are probably the best known. Here, we focused on accuracy, kappa and the area under the 'receiver operating characteristic' curve' (ROC curve), known as the AUC or ROC-AUC [47]. If the mean overall accuracy was <0.33, the results were considered non-significant.
Sixteen models were created, to examine changes over time (Slots 2, 3, 4 and 5) relative to the baseline (Slot 1), for each of the four stimulation frequencies.
Methodology: Part B, Phase 2 The feature extraction methodology utilised here is based on a combination of the methods used in our previously published papers [41,48,49], with data standardised using the StandardScaler() command in the Python scikit-learn library. Standardising the input data can help to reduce the effect of outliers or extreme values and improve the performance of the machine learning model. Some algorithms are particularly sensitive to the scale of the input data (see Supplementary Materials), and thus, may perform better when the data are standardised. Our objective was to investigate how EEG differed before, during and following TEAS, and whether such differences were dependent on stimulation frequency. As in Phase 1, the analysis was conducted using a Sequential API in Keras and a hybrid CNN-LSTM model, but this time, omitting the Dropout layers and adding a further LSTM layer (Table 5). Omitting the Dropout layers and adding a further LSTM layer can potentially improve a model's performance on the training data (but can also lead to overfitting if the model is not properly regularised). It is generally considered good practice to try a variety of different model architectures and regularisation techniques to find the best balance between model complexity and generalisation to new data.
Twenty models were created, to examine changes over time (baseline (Slot 1), stimulation (Slots 2-5) and Post-stimulation (Slots 6-8)) in each of the five EEG bands (delta, theta, alpha, beta and gamma) for the four stimulation frequencies.
Methodology: Part B, Phase 3 Here, we did not use Keras and the DL hybrid CNN-LSTM algorithm, but a (relatively) shallow learning method based on Multilayer Perceptron (MLP), a scaled conjugate backpropagation MLP Neural Network (NN) with 10 hidden layers, available as standard in the MATLAB Deep Learning toolbox [50]. Inputs were taken from the 19 electrodes, and the output was provided as a 5 × 5 confusion matrix for the five EEG bands. Our objective here was to determine whether differences among the EEG bands vary with both time (baseline, stimulation or post-stimulation) and stimulation frequency. Table 5. Model summary for Phase 2. Note that 3-fold cross-validation was used, and SoftMax was used rather than the Sigmoid activation function in Phase 1, with a far greater number of parameters for each layer than before (these terms are explained in the online Supplementary Materials).  Figure 5 shows the model architecture for Phase 3. Features extracted from WPD and wavelet entropy analysis were selected as inputs. 'W' and 'b' are learnable parameters; 'W' corresponds to the weights of the Neural Network, and 'b' to the bias values [51]. The data were randomly divided into training, validation and test sets using the MATLAB 'dividerand' function. The Scaled Conjugate Gradient training method was applied (using the MATLAB 'trainscg' function), and cross-entropy was used to evaluate performance. Information was also provided on algorithm performance: the number of iterations required, processing time, performance, gradient, and the maximum number of validation checks to be conducted (held at 6, the default). If the gradient was <10 ′ 5, or the number of checks reached 6, training was stopped. Here, we did not use Keras and the DL hybrid CNN-LSTM algorithm, but a (relatively) shallow learning method based on Multilayer Perceptron (MLP), a scaled conjugate backpropagation MLP Neural Network (NN) with 10 hidden layers, available as standard in the MATLAB Deep Learning toolbox [50]. Inputs were taken from the 19 electrodes, and the output was provided as a 5 × 5 confusion matrix for the five EEG bands. Our objective here was to determine whether differences among the EEG bands vary with both time (baseline, stimulation or post-stimulation) and stimulation frequency. Figure 5 shows the model architecture for Phase 3. Features extracted from WPD and wavelet entropy analysis were selected as inputs. 'W' and 'b' are learnable parameters; 'W' corresponds to the weights of the Neural Network, and 'b' to the bias values [51]. The data were randomly divided into training, validation and test sets using the MATLAB 'dividerand' function. The Scaled Conjugate Gradient training method was applied (using the MATLAB 'trainscg' function), and cross-entropy was used to evaluate performance. Information was also provided on algorithm performance: the number of iterations required, processing time, performance, gradient, and the maximum number of validation checks to be conducted (held at 6, the default). If the gradient was < 10-5, or the number of checks reached 6, training was stopped. In our final experiment, we reverted to using the CNN-LSTM model in Keras, as in Phase 1, but this time, replacing the Dropout layers with max pooling (downsampling) layers, with three LSTM layers (as in Phase 2) and 5-fold rather than 3-fold cross-validation (see the online Supplementary Materials for an explanation of these terms). The objective was the same as in Phase 3, but we also wished to compare how the MLP-NN and CNN-LSTM models performed.

Model
Different DL models were developed for each phase to accommodate the different numbers of classes in the models, which were used for different purposes in each phase. In our final experiment, we reverted to using the CNN-LSTM model in Keras, as in Phase 1, but this time, replacing the Dropout layers with max pooling (downsampling) layers, with three LSTM layers (as in Phase 2) and 5-fold rather than 3-fold cross-validation (see the online Supplementary Materials for an explanation of these terms). The objective was the same as in Phase 3, but we also wished to compare how the MLP-NN and CNN-LSTM models performed.
Different DL models were developed for each phase to accommodate the different numbers of classes in the models, which were used for different purposes in each phase. Our overall aim was to produce useful solutions to problems rather than developing models and making comparisons.

Results: Part A
The feature extraction results from Part A were provided as inputs to the CNN-LSTM hybrid classifier (Part B, Phases 1,2 and 4) and to the MLP-NN algorithm (Part B, Phase 3). These results are therefore not reported separately.

Results: Part B 3.2.1. Part B, Phase 1
As a summary metric, the values of kappa were calculated based on the binary confusion matrix results obtained. The results are shown in Table 6. These were based on a multi-class evaluation [52], considering each class as a separate binary classification problem, and then, combining the results to give an overall evaluation of the classifier's performance on the multi-class task. Thus, the results differ from those obtained using Vanetti's online calculator.  The CV (Coefficient of Variation) (i.e., SD/mean) for kappa is 0.134, and the mean 0.419 (SD 0.056). Thus, disregarding any effects on the individual EEG bands, both the greatest and smallest differences from baseline occurred in Slots 3, 4 and 5; the greatest differences occurred during 80 pps or sham stimulation, and the smallest differences during 2.5 or 10 pps. According to the guidelines suggested by Landis and Koch [53], values of kappa between 0.21 and 0.40 could be considered as 'fair', those between 0.41 and 0.60 as 'moderate', those between 0.61 and 0.80 as 'substantial', and those between 0.81 and 1.00 as 'perfect'. Here, more than half could be considered as 'moderate,' but none as 'substantial.' In other words, the model performs reasonably well, although it does not provide detailed information about the EEG changes that occur. The processing time for Phase 1 was approximately 5 h.

Part B, Phase 2
As a summary metric, the values of kappa were calculated based on the 3 × 3 confusion matrix results obtained (Slot1/Slots 2-5/Slots 6-8) for each of the 20 (5 band × 4 stimulation frequency) models. The results are shown in Table 7.
The CV (Coefficient of Variation) for kappa is 0.467, and the mean 0.506 (SD 0.236). Thus, when taking the EEG bands into account, the greatest differences among Slot 1/Slots 2-5/Slots 6-8 occurred for 2.5 pps in the Theta, Alpha and Gamma bands, and 80 pps in Alpha. The smallest differences occurred for the sham in the Delta, Beta and Gamma bands, for 10 pps in the Alpha, Beta and Gamma bands, and for 80 pps in Delta. Here, five values of kappa could be considered moderate, three as substantial, and two as very good indeed ('perfect'). The processing time for Phase 2 was approximately 8 h. As a summary metric, the values of kappa were calculated based on the 5 × 5 confusion matrix results obtained (Alpha, Beta, Delta, Gamma and Theta) for each of the 12 (3 time (Slot 1/Slots 2-5/Slots 6-8) × 4 stimulation frequency) models. The results are shown in Table 8. The CV for kappa is 0.054, and the mean 0.660 (SD 0.036). Differences among the EEG bands are marginally greater for 2.5 pps at baseline and post-stimulation, as well as the sham at baseline, and marginally less for 10 pps during and post-stimulation, as well as for 80 pps post-stimulation. According to the guidelines of Landis and Koch [52], these differences are all 'substantial', apart from those for 80 pps post-stimulation, which are 'moderate'. The processing time for Phase 3 is estimated to be around 5 h.

Part B, Phase 4
As a summary metric, the values of kappa were calculated based on the 5 × 5 confusion matrix results obtained (Alpha, Beta, Delta, Gamma and Theta) for each of the 12 (3 time (Slot1/Slots 2-5/Slots 6-8) × 4 stimulation frequency) models. The results are shown in Table 9. The CV for kappa is 0.137, and the mean 0.850 (SD 0.116). Using this approach rather than the MLP-NN algorithm, differences among the EEG bands are again greater for 2.5 pps at baseline and less for 10 pps post-stimulation, but otherwise, there is little agreement between the two methods. Kappa is greatest for the sham and 2.5 pps at baseline, for 10 pps during stimulation, and for 80 pps post-stimulation. According to the guidelines of Landis and Koch [53], 8 of the differences are 'perfect' (>0.81) and the remainder 'substantial'. The processing time for Phase 1 is again estimated to be around 5 h. Figures 6-9 provide examples of outputs (train and test accuracy or ROC plots, and confusion matrices) for the models in each phase with the highest value of kappa. Graphs of model accuracy and loss are shown over 100 epochs, where each epoch represents a full pass through the training dataset. Test loss is a measure of how well the model can make predictions on data it has not seen before (i.e., the test set). Ideally, the test loss will decrease over time, and the accuracy will increase, indicating that the model is learning and improving. (AUC) is a measure of the classifier's performance (the AUC ranges in value from 0 to 1, with a higher value indicating a better classifier).   Here, macro-and micro-average AUC are the same (the former computes the metric independently for each class, and then, takes the overall average, while the latter aggregates contributions from all classes). (AUC) is a measure of the classifier's performance (the AUC ranges in value from 0 to 1, with a higher value indicating a better classifier).       Each confusion matrix evaluates the performance of a classification algorithm, with each row representing instances in a predicted class, and each column instances in an actual class. The entry in the top-left corner represents the number of instances that were correctly predicted to be in the first class, and so forth.
The receiver operating characteristic (ROC) curves show the performance of binary classifiers as their discrimination thresholds are varied. The ROC curve plots the truepositive rate (also called sensitivity or recall) on the y-axis and the false-positive rate (1specificity) on the x-axis. A classifier with a higher true-positive rate and a lower falsepositive rate will be higher and further to the left on the curve. The area under the curve (AUC) is a measure of the classifier's performance (the AUC ranges in value from 0 to 1, with a higher value indicating a better classifier).

Discussion
Deep learning (DL) methods have been widely used in various fields, including medical research. In recent years, DL has been applied to acupuncture-related research, providing new insights and understanding into the effects of acupuncture on the human body. The application of DL to acupuncture-related research presents several unique challenges, such as the limited availability of high-quality data, the complex and nonlinear relationships between acupuncture points and physiological responses, and the need to consider the potential biases and confounding factors in the data.
Despite these challenges, the application of DL to acupuncture-related research has the potential to greatly advance our understanding of the mechanisms and effects of acupuncture, as well as its clinical applications. By leveraging the power of DL algorithms, researchers can analyse and model large, complex datasets, identify patterns and relationships in the data that are not easily apparent through traditional statistical methods, and make predictions about the effects of acupuncture on various physiological responses.
Based on a literature review, the authors of this study provide background information on artificial intelligence as used in EEG analysis, with an introduction to machine learning and deep learning methods for those-especially clinicians such as acupuncturists and physiotherapists-who may be unfamiliar with them. Summaries of the advantages and disadvantages of both ML and DL approaches are included, and also of some of their more commonly used algorithms. In addition, a literature review of EEG studies on acupuncture and related modalities was conducted. Based on these reviews, which, in themselves, provide a useful contribution to the literature, the authors used a combination of CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) algorithms, as well as WPD (wavelet packet decomposition) for feature extraction.
The experimental set-up was described, including TEAS, EEG data collection and preprocessing. We analysed the EEG data collected in four different ways (Phases 1 to 4): Phase 1. Sixteen hybrid CNN-LSTM models were created in Keras, to examine changes over time during stimulation (Slots 2, 3, 4 and 5) relative to the baseline (Slot 1), for each of the four stimulation frequencies, but without examining the filtered EEG bands separately. This resulted in 2 × 2 confusion matrices.
Phase 2. Twenty hybrid CNN-LSTM models were created in Keras, to examine changes over time (baseline (Slot 1), stimulation (Slots 2-5) and post-stimulation (Slots 6-8)) in each of the five EEG frequency bands (delta, theta, alpha, beta and gamma) for the four stimulation frequencies. This resulted in 3 × 3 confusion matrices.
Phase 3. Twelve scaled conjugate backpropagation MLP-NN models with 10 hidden layers were created using MATLAB, to examine differences between the EEG bands at baseline, and during and following stimulation, for the four stimulation frequencies. This resulted in 5 × 5 confusion matrices.
Phase 4. Here, we reverted to using the CNN-LSTM model in Keras, rather as in Phase 1, but with more LSTM layers. The objective was to examine the same differences as in Phase 3, so that the two very different methods could be compared. Again, this resulted in 5*5 confusion matrices.
As with all (unsupervised) DL methods, however useful they may be in identifying and classifying differences, the results are not easy to interpret due to the complexity of the algorithms and the lack of a clear understanding of underlying mechanisms. This can be a major challenge in any study. Another potential challenge is that models may be prone to bias if the training data reflect biased patterns. A third challenge is in determining how to achieve computational efficiency.
The Phase 1 analysis appears to show that the greatest differences from the baseline occurred during 80 pps or sham stimulation, and the smallest differences during 2.5 or 10 pps stimulation. These results are plausible, if diametrically opposed to those that might be expected from the literature [17][18][19].
The phase 2 analysis suggests that differences between baseline, stimulation and poststimulation EEG are greatest for 80 pps TEAS in the alpha band, and for 2.5 pps TEAS in the gamma band, with the smallest effect for 10 pps in alpha. The values of kappa showed greatest variance in Phase 2 analysis. Without knowing whether (and at which electrodes) these differences indicate increases or decreases in band power, or some other associated feature, these findings are hard to interpret. Would 2.5 pps TEAS be experienced as more stressful than 80 pps, for example, so that gamma power might increase with 2.5 pps stimulation, but alpha power with 80 pps? Further research is required to disentangle questions such as this.
The results of the Phase 3 analysis do not indicate major differences between any of the models, with the greatest differences among EEG bands at baseline for the sham and 2.5 pps TEAS, as well as for 2.5 pps post-stimulation. The Phase 4 results are quite different, with greatest differences among EEG bands, again, at baseline for 2.5 Hz TEAS, during stimulation for 10 pps and post-stimulation for 80 pps. However, differences among bands are to be expected, are not necessarily the result of stimulation and could be explained in many ways. None of the results from Phases 3 or 4 shed any light on the effects of stimulation frequency. It is of interest, though, that the mean and CV for kappa were considerably higher for the CNN-LSTM than for the MLP-NN model. This assemblage of results provides a further useful contribution to the literature. However, in a world of limited resources that are increasingly under stress on many levels, an important general question is whether greater accuracy and precision should be prioritised over the energy-information costs incurred (solving a problem with a shallow structured network is always more advantageous in terms of computational burden if it can be solved). In what circumstances is a slightly fuzzy classification 'good enough'? Here, the two models give different results, so perhaps, regardless of cost, those from the more computationally demanding model (CNN-LSTM) should be adopted, although which represents 'ground truth' is still a moot point. This may not always be the appropriate decision, and some may consider that the outcomes of this study do not justify the amount of energy and time it has taken to complete (almost 24 h in computation time for the AI algorithms alone, with another 12 h, approximately, for additional computation conducted using Google Colab in the cloud network). In conclusion, the software and hardware platforms used in deep learning operation are critical. Here, they were carefully selected to ensure accurate and reliable results. The inevitable human 'wear-and-tear' toll taken by intensive, collaborative research work should also be considered.

Some Limitations
The data were recorded in imperfect circumstances, in a laboratory that was not soundproofed or temperature-controlled. Nonetheless, external noise was minimised as far as possible, and an attempt was made to keep the space at a comfortable temperature over the course of the year during which data were collected.
Our 'sham' (160 pps) TEAS was not completely without physiological effect, which may have biased our results. In retrospect, although various sham stimulation methods have been explored over the years by different researchers, some subthreshold and some suprathreshold [54][55][56], we should have amended our own experimental set-up to ensure no current whatsoever reached the participants. Unfortunately, we made a false assumption based on initial pilot experiments in which sham stimulation was, indeed, subthreshold for the test participants. This was not so for all those who took part in our final study. Nonetheless, the output was set to 'zero' on the Equinox device, so considerably lower during this imperfect sham stimulation than during the 'active' stimulations.
Only 66 participants took part in this study-a small dataset for a DL study. However, the use of training, validation and test sets, and of 5-fold validation, should have compensated for this.
In this paper, we did not tackle the question of whether our findings were the result of neural or volume conduction, or whether they indicated a central frequency-following response to peripheral stimulation. Moreover, our analysis did not investigate the EEG electrode-specific effects of TEAS, nor, indeed, the effects over different scalp regions. In addition, we did not explore how EEG might change during and following TEAS at different frequencies.
Unsupervised DL methods suffer from the problem of interpretability. This was exacerbated in the present study by communication difficulties between the clinicians and computer scientists involved, whom all had very different skills, mindsets, interests and languages. This project provided us all with a challenging and immersive learning experience. We hope not too many misinterpretations remain.

Conclusions
The application of DL to acupuncture-related research is a step change in the field and has the potential to greatly advance our understanding of the mechanisms and effects of acupuncture, as well as its clinical applications. By leveraging the power of DL algorithms, researchers can analyse and model large, complex datasets, identify patterns and relationships in the data that are not easily apparent through traditional statistical methods, and make predictions about the effects of acupuncture on various physiological responses.
This study is the first of its kind to use artificial intelligence to explore the effects of TEAS frequencies on EEG. From the published literature, no AI research appears to have been conducted into the effects on EEG of different frequencies of electroacupuncture-type stimulation (whether EA or TEAS), although there are several studies on the effects of manual needle rotation frequency from Tianjin University. Additionally, from the published literature, both WPD and the hybrid CNN-LSTM model appear to be appropriate methods of examining the central (EEG) effects of peripheral stimulation (TEAS). Using these methods, we found-contrary to expectation-that the greatest differences in EEG from baseline occurred during 80 pps or the 'sham' (160 pps) TEAS applied to the hands), with a mean kappa of 0.454 and 0.467, respectively, while the smallest differences occurred during 2.5 or 10 pps stimulation (mean kappa: 0.393 and 0.360). On the other hand, when taking the EEG bands into account, the greatest differences among Slot 1 (baseline)/Slots 2-5 (stimulation)/Slots 6-8 (post-stimulation) occurred for 2.5 pps TEAS in the Theta, Alpha and Gamma bands, and for 80 pps TEAS in Alpha (mean kappa 0.506). Even higher values of kappa were obtained from differences among the EEG bands before, during and after TEAS at different frequencies, but this result was difficult to interpret and explain, and warrants further exploration in future studies.

Future Directions
There are many potential avenues for further research based on the findings of this paper. Possible approaches could include conducting additional experiments to confirm or refute the findings of this study, as well as using different algorithms, different frequencies of TEAS, or different subject populations. Further research is planned using conventional methods of EEG analysis, different frequency bands (e.g., narrow bands centred on the stimulation frequencies), as well as ML methods based on careful feature selection, in order to see if the results obtained here can be replicated or improved-or, indeed, explained. Such features will include several connectivity (or graph theoretical) measures, including those of a source localisation method such as sLORETA (standardised low-resolution brain electromagnetic tomography), to investigate whether findings are due to neural or volume conduction, or, indeed, both. Changes over time both during and after stimulation should also be investigated for different TEAS frequencies. Changes due to volume conduction effects would only occur during, not after stimulation.
Because of the potentially large number of features that could be examined, automated feature selection is an option for use in this further investigation. EEG cordance and some topological measures have been computed for the current dataset. Although these results remain unpublished as yet, they may also be useful in guiding feature studies.
Furthermore, particular attention could be paid to entropy measures, whether in the time, frequency or spatial domain, as well as wavelet-based entropy using different entropy estimators, such as discrete wavelet entropy or permutation entropy. These entropy measures could potentially provide useful insights into the effects of different frequencies of TEAS on the brain, by quantifying the changes in the degree of disorder or uncertainty in the EEG signal.
Furthermore, future research protocols (1) could use EEG with a greater numbers of electrodes, (2) should ensure that 'sham' treatment is genuinely sham, and (3) could make use of further methods of data augmentation to strengthen effects. There is therefore scope for new research, such as that published here, that explores the effects of the frequency of TEAS on EEG using AI methods, with the most obvious place to start being a hybrid CNN-LSTM model. WPD also appears potentially suitable as a feature extraction method that could be used in conjunction with this type of DL model, if required (although, of course, one of the advantages of CNN is that feature extraction is performed by the algorithm itself, without prior handcrafting of features).

Supplementary Materials:
The following supporting information can be downloaded at: https://ww w.mdpi.com/article/10.3390/app13042703/s1, S1. A brief overview of AI: machine learning (ML) and deep learning (DL) in EEG analysis: S1.1. A rough sketch of ML methods used for EEG data analysis; S1.2. Input and output in ML methods used for EEG data analysis; S1.3. Algorithm hyperparameters in ML; S1.4. A rough sketch of DL methods used for EEG data analysis; S1.5. Input and output in DL methods used for EEG data analysis; S1.6. Algorithm hyperparameters in DL. S2. The CNN-LSTM hybrid model: S2.1. A brief description of CNN; S2.2. A brief description of LSTM. Refs.  are cited in the supplementary materials.  Informed Consent Statement: Informed consent was obtained from all participants involved in the study. Participants were informed about the study and signed a consent form with the explicit agreement that their anonymised data would be retained for further analysis by the research team, and also shared with the ongoing Human Brain Indices (HBI) reference database.

Data Availability Statement:
The EEG data presented in this study will thus soon be freely available in the ongoing Human Brain Indices (HBI) reference database (https://www.hbimed.com) (accessed on 13 December 2022), and in Open Research Data Online (ORDO), The Open University's searchable research data repository at https://ordo.open.ac.uk/ (accessed on 13 December 2022).

Acknowledgments:
We thank the University of Hertfordshire for permitting us to conduct this study and for facilitating recruitment; Lidia Zaleczna and Aiste Noreikaite for the hours they spent carefully collecting the EEG data; and Paul Steinfath for his invaluable assistance with pre-processing the EEG data. We also thank our volunteers for their participation; to our families and partners for their continued patience and support; and many other colleagues for their discussions and other input that helped to shape the study, in particular, Neil Spencer (Professor of Applied Statistics) and Iosif Mporas (Reader in Signal Processing and Machine Learning) at the University of Hertfordshire. Finally, we thank the Acupuncture Association of Chartered Physiotherapists (AACP) and DM's patients, whose financial support indirectly made this study possible.

Conflicts of Interest:
The authors declare no conflict of interest. The Acupuncture Association of Chartered Physiotherapists had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.