3.1. Experimental Data Recording
Behavioral data were collected during the experiment mainly using the E-prime software. Data included the time (reaction time) required for subjects to respond to the target stimulus and the correct rate (correct rate) of subject selection.
3.1.1. Reaction Time Analysis
Response time data are automatically collected according to the time keys needed for subjects to make appropriate responses after the target stimulus appears. In this experiment, this refers to the time required for the subjects to judge the semantic matching degree of the image words (image consistent, image neutral and image inconsistent) after the appearance of the image words in the target phase.
For the reaction time, a repeated-measures analysis of variance (MANOVA) was used to analyze intention word pairs and semantic matching reaction time. Image word pairs were used as intergroup factors and semantic matching as intragroup factors. The results show that image words had a significant main effect, F (2, 18) = 168.126, p < 0.001. Semantic matching also had a significant main effect, F (2, 18) = 184.873, p < 0.001. The interaction effect between the lexical pairs and the degree of agreement was not significant, F (4, 16) = 2.845, p = 0.059. From the analysis of the semantic matching analysis, the subjects’ reaction time was the longest when the semantic value of the image was neutral. When the semantic value of the image was ambiguous, subjects consumed more cognitive resources to make a judgement, which required more thinking time.
3.1.2. Accuracy Analysis
In the experiment, if the result of the subject’s assessment of the degree of semantic matching of the stimulus material agreed with the result of the semantic value classification, the answer was considered correct. Otherwise, it was considered incorrect. According to the statistics of the accuracy rate, the accuracy rate of the image consistent was slightly higher than that of the image inconsistent, while the accuracy rate of the image neutral was the lowest.
The accuracy rate was analyzed using the analysis of variance with repeated measures of image word pairs and semantic matching accuracy rate. Image word pairs were used as intergroup variables and semantic matching as intragroup variables. The results showed that image word pairs had significant main effects, F (2, 18) = 23.941, p < 0.001; semantic matching also had a significant main effect, F (2, 38) = 19.924, p < 0.001. There was some interaction effect between image word pairs and semantic matching, F (4, 76) = 3.326, p = 0.014. From the results, subjects had a higher matching rate when they judged the stimulus materials with image consistent, and the matching rate was lowest when they judged the stimulus materials with image neutral. However, for the image neutral stimulus materials, the accuracy rate was still 77.8%.
3.2. ERP Experimental Data Analysis
The ERP data analysis followed the “start–target” paradigm, which was mainly divided into the start phase and the target phase. These two phases included the analysis of the brain region activation sites and the analysis of ERP components, respectively.
The start phase refers to the time period during which the vehicle stimulus was presented on the screen. During this period, subjects recognize the stimulus material and simultaneously compare it with the visual vocabulary in the experimental instructions. Consequently, participants engage in cognitive processing, allocate attentional resources, undergo semantic processing and partake in various other cognitive processes.
The target phase refers to the period of the presentation of the image vocabulary in which subjects must judge whether the stimuli presented in the start phase match the image vocabulary in the target phase and quickly press the keyboard to respond. This period mainly involves the subjects’ cognitive activities, such as semantic processing and working memory updating.
3.2.1. ERP Data Pre-Processing
Curry 8.0 software was used for data processing in the experiment. The M1 and M2 electrodes at the mastoid were set as reference electrodes, 30 Hz low-pass filtering was selected, and four eye electrodes at VEOU, VEOL, HEOL and HEOR were set to remove artifacts. During the entire experiment, the experimenters had to ensure that the electrodes were fixed in good condition, and the resistance value of all electrodes had to be lower than 15 kΩ before the experiment started to obtain high-quality EEG data. The sampling rate was set to 1000 Hz, and 30 Hz low-pass filtering was used to remove high-frequency noise. In the start phase, 200 ms before stimulus presentation was selected and 1000 ms after stimulus presentation as the standard for segmentation; 200 ms was selected before stimulus presentation as the baseline correction: and the ERP data of six experimental trials into two (“classical” and “modern”, “tough” and “round”; “speed” and “steady”) in order to complete the preprocessing of ERP data. After preprocessing the data, the available data from the remaining 20 subjects were reviewed.
3.2.2. Startup Phase P300
The ERP data from 200 ms before stimulus presentation to 1000 ms after stimulus presentation were selected for analysis in the initial phase. The topographic map of the brain area is shown in
Figure 4. From the figure, it can be seen that there was obvious positive activation in the occipital area and left prefrontal lobe during the period of 200–400 ms, while there was obvious negative activation in the right prefrontal lobe during the period of 300–500 ms. As shown in
Figure 5, there was an apparent P300 component during the period 200–500 ms and an obvious N400 component during the period 300–600 ms, and then there was a late positive component at 600–700 ms. According to the topographic map of the brain and the overall waveform, the energy of the left frontal area gradually increased in the period of 200–300 ms, indicating the presence of P300 components. The energy of the left frontal area increased significantly in the period of 300–500 ms, and the P300 component peaked. In addition, the energy of the right frontal area decreased significantly during the 300–500 ms period, resulting in N400 components. Late components occurred after 600 s.
For the P300 components, we selected the average amplitude of ERP during the 300–400 ms period of the start phase for RMANOVA and took the distribution positions of brain electrodes (frontal electrodes (AF3 and AF4), superior electrodes (FC5 and FC6), occipital electrodes (O1 and O2)) as the intergroup variables and three groups of image words pairs and three types of semantic matchings as intragroup variables. The results showed that the degree of semantic matching and the position of the electrodes in the brain region had a large effect on the P300 components, and the degree of semantic matching had a significant main effect, F (2, 18) = 52.962,
p < 0.001. The electrode position in the brain region had a significant main effect, F (5, 92) = 21.092,
p < 0.001. There was a significant interaction effect between image word pairs and semantic matching, F (10, 10) = 5.828,
p = 0.005. There was an interaction effect between semantic matching degree and brain area distribution, F (10, 10) = 4.841,
p = 0.01. A paired t-test was performed for the distribution position of electrodes in the brain area. It can be seen that the average amplitude was highest at electrode AF3, which is consistent with the topographic map of the ERP, as shown in
Figure 6. Therefore, electrode AF3 was selected for further analysis of the P300 components.
For each type of image–word pair in AF3, stimulus materials with different levels of degrees of semantic matching induced P300 components with different amplitudes. The average amplitude of the P300 components induced using “tough-round” and “speed-steady” was the image inconsistent > image neutral > image consistent, while the P300 components induced using “classic-modern” were slightly different. The result is that image consistent > image inconsistent > image neutral. Based on the paired t-test of three groups of image words with different levels of degrees of semantic matching, it can be concluded that for the average P300 amplitude induced using “classical-modern” word pairs, there was no significant difference between image consistent and image inconsistent, t = 1.056, p = 0.252. The average amplitude of image neutral P300 was significantly lower than that of the image consistent and image inconsistent. The conclusion for the word pair “tough-round” and “speed-steady” was similar, i.e., image inconsistent > image neutral > image consistent. A small difference was that for the word pair “tough-round”, there was also a significant difference between image neutral and image inconsistent, t = 2.254, p = 0.036.
3.2.3. Startup Phase N400
For the N400 component, the average amplitude of the ERP in the period of 350–500 ms during the start phase was selected for RMANOVA, and the distribution position of the brain electrodes (frontal electrodes (AF3, AF4, F7 and F8), occipital electrodes (O1 and O2)) were taken as intergroup variables, three groups of image words pairs (classical–modern, tough–round and speed–steady) and three types of semantic matching (image consistent, image neutral and image inconsistent) were used as variables within the group. The results showed that image word pairs, semantic matching and electrode distribution in brain regions all had some influence on the average amplitude of N400 components. Image words pairs had a main effect, F (1.757, 33.374) = 5.989,
p = 0.005; semantic matching had a significant main effect, F (1.656, 31.464) =17.811,
p < 0.001; and there was a significant main effect on the location of electrodes in the brain area, F (5, 15) = 31.513,
p < 0.001. There was a significant interaction between image word pairs and semantic matching, F (7, 16) = 4.596,
p = 0.001. There was an interaction effect between brain region distribution and semantic matching, F (10, 10) = 5.528,
p = 0.006. The paired t-test of the brain region electrode distribution position showed that electrode F8 had a higher N400 amplitude than other electrodes, and the difference was significant, as shown in
Figure 7. Therefore, electrode F8 was selected as the target electrode for the N400 component in the initial phase of the research.
For each type of image word pair in the F8 electrode, stimulus materials with different degrees of semantic matching induced N400 components with different amplitudes. Among them, the N400 amplitude induced using an image neutral was the highest, followed by an inconsistent image, and the N400 amplitude induced using an image consistent was the lowest. Paired t-tests with different levels of semantic matching were performed based on three groups of image words. For the three types of image–word pairs, the N400 waveform showed a relatively consistent trend, i.e., N400 with the highest amplitude induced using image neutral, followed by image inconsistent and N400 with the lowest amplitude induced using image consistent. For the “classical–modern” word pair, the difference between image consistent and image neutral was significant, t = 5.06, p < 0.001, the difference between image consistent and image inconsistent was significant, t = 4.904, p < 0.001, and the difference between image neutral and image inconsistent was significant, t = −3.139, p = 0.005. For the “tough round” word pair, the difference between image consistent and image neutral was significant, t = 6.283, p < 0.001, the difference between image consistent and image inconsistent was significant, t = 4.928, p = 0.000, and the difference between image neutral and image inconsistent was significant, t = −3.28, p = 0.005. For the word pair of “speed–steady”, the difference between image consistent and image neutral was significant, t = 4.962, p < 0.001, the difference between image consistent and image inconsistent was significant, t = 3.395, p = 0.004, and the difference between image neutral and image inconsistent was significant, t = −4.575, p < 0.001.
3.2.4. Target Phase P300
In the target phase, ERP data from 200 ms before stimulus presentation to 1000 ms after stimulus presentation were selected for analysis. The topographic map of the brain area is shown in
Figure 8. From the figure, it can be seen that there was significant positive activation in the occipital area within 100–300 ms and significant negative activation in the apical area within 300–500 ms. The amplitude of the electrode POZ is shown in
Figure 9. According to the topographic map of the brain and the overall waveform, the energy in the occipital area increased gradually during the period 100–200 ms, indicating the presence of P300 components; during the period of 200–400 ms, the energy in the occipital area increased significantly, and the P300 component peaked; in addition, the energy in the superior area decreased significantly during the period of 300–500 ms, and the N400 component peaked. The P300 component is significantly related to working memory updating, while the N400 component is related to semantic processing and semantic conflicts. The P300 component is explained here. As subjects were asked to judge the stimulus materials presented in the initial phase based on the image vocabulary, they will recall the image style corresponding to the stimulus materials presented before making a judgment, so the P300 component appears.
For the P300 components, the average amplitude of the ERP in the target period of 200–300 ms was selected for RMANOVA, and the distribution positions of the brain electrodes (frontal electrodes (AF3 and AF4), top electrodes (FC5 and FC6), occipital electrodes (POZ and O1)) were taken as intergroup variables, three groups of image–word pairs (classical–modern, tough–round and speed–steady) and three types of semantic matchings (image consistent, image neutral and image inconsistent) were used as within-group variables. The results show that the degree of semantic matching and the position of the electrodes in the brain region had a larger effect on the P300 components, and the degree of semantic matching had a significant main effect, F (1.954, 31.123) = 8.626,
p = 0.001; the distribution position of the electrodes in the brain area had a significant main effect, F (5, 15) = 14.717,
p < 0.001. There was a significant interaction between image word pairs and semantic matching, F (7, 16) = 9.838,
p = 0.001; there was an interaction effect between the degree of semantic matching and the position of the electrode distribution in the brain area, F (10, 10) = 8.066,
p = 0.001. The paired t-test of the electrode distribution in the brain area shows that the average amplitude at electrode POZ was the highest, while that at electrode FC5 was the lowest, which is consistent with the topographic map of the ERP, as is shown
Figure 10. So the electrode POZ was selected for further analysis of the P300 components.
The amplitude of P300 triggered by different types of image words in the electrode POZ differed to some extent. For the “classical–modern” word pairs, the amplitude of P300 triggered by image neutral was the highest, followed by the amplitude of image consistent and image inconsistent. For the “tough–round” word pair, the amplitude of P300 triggered by image inconsistent was highest, followed by image neutral, and the amplitude of image consistent was lowest. For the word pair “speed–steady”, the amplitude of induction from image consistent was highest, followed by image neutral, and the amplitude of induction from image inconsistent was lowest. Based on the paired t-test of three groups of image words under different degrees of semantic matching, it can be seen that the P300 waveform shows significant differences for the three groups of image word pairs. For the word pairs “classical–modern”, the difference between image consistent and image neutral was significant, t = −3.314, p = 0.003, and the difference between image neutral and image inconsistent was significant, t = 3.116, p = 0.005. For the word pair “tough–round”, the difference between image consistent and image inconsistent was significant, t = −3.55, p = 0.002, and the difference between image neutrality and image inconsistent was significant, t = −2.941, p = 0.009. For the word pair “speed–steady”, the difference between image consistent and image neutral was significant, t = 3.651, p = 0.002, the difference between image consistent and image inconsistent was significant, t = 5.168, p < 0.001, and the difference between image neutral and image inconsistent was significant, t = 3.273, p = 0.004.
3.2.5. Target Phase N400
For the N400 component, the average amplitude of the ERP in the 250–400 ms period during the start phase was selected for RMANOVA, and the distribution positions of the brain electrodes (frontal electrodes (AF3 and AF4), top electrodes (FC1 and FC2) and occipital electrodes (O1 and O2)) were taken as the intergroup variables; three groups of image word pairs (“classical–modern”, “tough–round” and “speed–steady”); and three types of semantic matchings (image consistent, image neutral and image inconsistent) were used as the within-group variables. The results showed that image word pairs, semantic matching and distribution of electrodes in brain regions all had some effect on the average amplitude of the N400 components. Image word pairs had a main effect, F (1.757, 33.374) = 5.986, p = 0.008; semantic matching had a significant main effect, F (1.954, 31.123) = 8.626, p = 0.001. There was a significant main effect on the location of electrodes in the brain region, F (5, 15) = 14.717, p < 0.001. There was a significant interaction between image word pairs and semantic matching, F (3.039, 57.750) = 4.757, p = 0.005. There was an interactive effect between the location of the electrodes in the brain area and semantic matching, F (10, 10) = 3.924, p = 0.021.
The paired t-test of the electrode distribution in the brain region shows that electrode FC2 has a higher N400 amplitude than other electrodes, and the difference was very significant, as shown
Figure 11. Therefore, electrode FC2 was selected as the target electrode for the N400 components in the initial phase of the research.
For each group of image–word pairs in the FC2 electrode, the stimulus materials with different levels of degrees of semantic matching induced N400 components with different amplitudes, with the image-neutral induced N400 amplitude being the highest, followed by the image-inconsistent and the image-consistent induced N400 amplitude being the lowest. Based on the paired t-test of three image–word pairs with different degrees of semantic matching, it can be seen that the amplitude of the N400 waveform was slightly different among the three groups of image–word pairs. For the word pair “classical–modern”, the N400 amplitude of image neutral was the largest, and the N400 amplitude of image consistent was the smallest. There was a significant difference between image consistent and image neutral, t = 3.921, p = 0.001. There was a significant difference between image consistent and image inconsistent, t = 2.692, p = 0.014. There was a significant difference between image neutral and image inconsistent, t = −3.383, p = 0.003. For the “tough–round” word pair, the N400 amplitude of image neutral was the largest, and the N400 amplitude of image consistent was the smallest. The difference between image consistent and image neutral was significant, t = 3.335, p = 0.004. The difference between image consistent and image inconsistent was significant, t = 2.579, p = 0.019. The difference between image neutral and image inconsistent was significant, t = −2.741, p = 0.013. For the word pair of “speed–stable”, the difference between image neutral and image inconsistent was not significant, the N400 amplitude of image neutral was the lowest, the difference between image consistent and image inconsistent was significant, t = 2.192, p = 0.043, and the difference between image neutral and image consistent was significant, t = 2.336, p = 0.035.
3.3. Discussion
Analysis of the ERP data and topographic map revealed that the P300 component occurred significantly in the 300–400 ms period of the start phase. Compared to three groups of image–word pairs, the P300 amplitude induced using image neutral was lower than that of image inconsistent. For the word pairs “tough–round” and “speed–steady”, the trend of P300 amplitude was image inconsistent > image neutral > image inconsistent. For the word pair “classical–modern”, the image consistent had the highest P300 amplitude. From a behavioral point of view, it can also be observed that the response time of “classical–modern” word pairs was generally long, and the accuracy rate was generally low, which suggests, to some extent, that the task difficulty of “classical–modern” word pairs was greater than that of the other two types of word pairs. The N400 amplitude occurred in the 350–500 ms period because it was in the starting phase, i.e., subjects with N400 components triggered with non-semantic stimuli were told to make judgments based on an image vocabulary in the subsequent target phase, which meant that after subjects viewed the car pictures in the starting phase, the brain will compare the image words in advance, so N400 components appeared. The same rule applied to the average amplitude of N400 for the three groups of image word pairs that had the same rule, that is, image neutral > image inconsistent > image consistent.
In the target phase, the latency of the P300 and N400 components was relatively early. P1, N1, P2, and other components appeared in the 0–200 ms period after the presentation of the stimulus, and P300 components with much shorter latency appear in the 200–300 ms period. For the three groups of image–word pairs, the P300 amplitudes induced using different degrees of semantic matching have their own rules: for the word pairs “classical–modern”, image neutral > image consistent > image inconsistent. For the word pair “tough–round”, the image inconsistent > image neutral > image consistent. For the word pair “speed–steady”, the image consistent > image neutral > image inconsistent. In addition, there were obvious N400 components at 250–400 ms. N400 was usually triggered by words and phrases that did not match or violate semantic expectations. Here, N400 was triggered by image words presented in the target phase. For three groups of image words, the average amplitude of N400 also had a consistent rule: image neutral > image inconsistent > image consistent (i.e., mismatched stimulus materials induce higher N400 amplitude), and the study found that image-neutral stimulus materials can induce the highest N400 amplitude.