1. Introduction
Reading constitutes a cognitively demanding process which is influenced by multiple factors that may impact the assimilation of written information [
1]. Students often engage in reading and studying within environments where various distractions—whether voluntary or involuntary—compete for their cognitive resources. One such common scenario is reading while simultaneously listening to music or watching television or in the presence of people talking in a café [
2,
3,
4]. Research has demonstrated that listening to music can influence performance in cognitively demanding tasks. However, the magnitude of this effect varies depending on multiple factors, including the nature of the task one is engaged in [
5,
6], the type of music played [
1,
7,
8], as well as the personality of the individual [
6,
9]. Therefore, it can be inferred that physiological and psychological factors affect the readers’ text comprehension and have an impact on their reading performance.
Visual patterns, as captured through eye movement measurements, constitute a well-established physiological indicator for assessing the cognitive and perceptual load during the reading process [
10,
11]. The eye-tracking technology is considered one of the methods that has started to attract the interest of educational researchers [
12,
13], as it is a method that offers the distinct advantage of providing real-time measures of the reading process while capturing moment-to-moment cognitive control during reading [
14]. Furthermore, numerous studies have demonstrated that eye-tracking is an effective research tool for elucidating the readers’ processing strategies during reading [
15,
16,
17,
18,
19,
20,
21,
22]. One of the most significant general eye movement measures relevant to reading is fixation duration [
12]. Fixation duration refers to the period during which the eye remains focused on a specific word. In adult readers, fixations typically range from 60 to 500 milliseconds (ms), with an average duration of approximately 250 ms [
12]. A longer fixation duration is typically linked to increased cognitive demands during task performance [
23], such as tasks performed while listening to music or under the conditions of café noise.
The distribution of visual attention is susceptible to be influenced by a multitude of cognitive processes, including, but not limited to perception, memory, language, and decision-making. Additionally, the duration of focus can be influenced by factors such as the point at which the eyes focus. Despite the inaccuracy of the eye-brain connection, a substantial body of research has demonstrated that the eyes reflect the cognitive processing of the object being looked at in that particular moment [
23,
24,
25,
26]. This makes eye-tracking a widely used tool in research investigating cognitive processes, as it provides data on the moment-to-moment evolution of cognition over time, rather than only revealing the final outcome. Although each individual is capable of selecting what to look at and when to look at it, the subtleties of that eye movement are largely involuntary. This is due to the fact that each individual lacks memory regarding the specific point in time at which they looked at a particular object [
13]. Consequently, eye-tracking can capitalize on the processing of data that occurs outside conscious awareness.
A second key aspect in the examination of text comprehension is reading performance, which is assessed through comprehension and reading time. Comprehension reflects the reader’s ability to process textual information, understand it, and integrate objective content, whereas reading time denotes the duration required for a participant to complete a given text [
10,
27]. Previous research has investigated the influence of background music on reading comprehension, with findings indicating that certain types of music, such as Mozart’s compositions and highly repetitive music, can have a positive effect [
28]. In addition, there are studies which suggest that background music may enhance the learners’ emotional states by helping to regulate their valence and arousal levels during reading [
1,
6]. On the other hand, other studies have reported negative effects on reading comprehension when background music includes genres such as hip-hop, slow-tempo music, fast and loud music, or familiar non-lyrical music [
29]. The discrepancies in these findings may be attributed to variations in musical characteristics, including loudness, tempo, structure, complexity, genre, familiarity, and whether the music contains lyrics or is purely instrumental [
30]. Moreover, individual differences among the participants, such as musical preferences and expertise, may further contribute to these inconsistencies [
1,
6]. Furthermore, one possible explanation for this discrepancy is that the reading comprehension is influenced by a multitude of factors. As a complex cognitive process, reading is shaped by elements such as word recognition, reading strategies, comprehension skills, and motivation [
31].
Finally, understanding the readers’ attitudes towards music offers a psychological perspective on reading conditions that may differ from objective reading outcomes, such as text comprehension. From a cognitive science standpoint, an individual’s perception or self-assessment (e.g., perceived levels of understanding or difficulty) does not always align with objective behavioral measures, including visual patterns and reading comprehension [
10]. Therefore, beyond assessing reading comprehension, evaluating reading attitudes—by exploring perceived aspects such as difficulty, confidence, and preference—may provide additional insights into text comprehension. Previous research on the subject has highlighted the impact of reading attitudes on comprehension. Specifically, intrinsic motivation, perceived difficulty, and self-assessed confidence in comprehension have been found to influence reading performance and achievement [
10].
Given the significance of investigating visual patterns, reading performance, and reading attitudes, this study examines these variables to gain a deeper understanding of how text comprehension is affected by different types of music and a certain type of background noise, using the eye-tracking technology. In this study, in order to investigate whether the claim that certain types of music improve text comprehension is true, we have chosen to focus not on a specific type of music or certain aspects of it, but on personal preferences: reading while listening to preferred music and reading while listening to non-preferred music. In addition, as students often choose to study in environments where there is noise from people talking, text comprehension will be studied in the presence of recorded noise from a café. Finally, text comprehension is studied under the condition of silence. In order to study the reading process as it unfolds, we used eye-tracking technology. In particular, the research hypotheses are proposed in the following sequence, which reflects the process from visual patterns to reading performance and to reading attitudes. This sequence is consistent with the structure and the aims of the study:
Visual Patterns Hypotheses:
Hypothesis 1 (H1): Time to First Fixation (TTFF) is expected to be longer in the non-preferred music and café noise conditions than in the preferred music and silence conditions.
Hypothesis 2 (H2): Participants are expected to exhibit longer fixation duration and more fixations count when reading under the non-preferred music and café noise conditions than under the preferred music and silence conditions.
Hypothesis 3 (H3): First Fixation Duration (FFD) is not expected to differ significantly across the four reading conditions.
Reading Performance Hypotheses:
Hypothesis 4 (H4): Participants are expected to achieve higher reading comprehension scores in the preferred music and silence conditions than in the non-preferred music and café noise conditions.
Hypothesis 5 (H5): Reading time is expected to be longer in the non-preferred music and café noise conditions than in the preferred music and silence conditions.
Reading Attitudes Hypotheses:
Hypothesis 6 (H6): Participants are expected to report higher perceived difficulty and fatigue when reading under the non-preferred music and café noise conditions than reading under the preferred music and silence conditions.
Hypothesis 7 (H7): Participants are expected to report higher perceived understanding, confidence, and immersion when reading under the preferred music and silence conditions than under the non-preferred music and café noise conditions.
2. Materials and Methods
The objective of the experiment was to investigate the impact of different auditory environments on visual patterns, reading performance and reading attitudes. Four reading conditions were employed: preferred music, non-preferred music, café noise, and silence. The experimental design was focused on measuring the extent to which readers could comprehend different texts in different reading conditions. Visual patterns were measured by fixation duration, fixations count, TTFF and FFD. Reading performance was measured by reading comprehension and reading time, and reading attitudes were assessed by five measures of reader attitude: perceived difficulty, perceived understanding, perceived confidence, perceived fatigue, and perceived immersion.
2.1. Experimental Set Up
The reading materials were selected from a reading comprehension section of the Greek language texts of the Panhellenic Examinations, which are conducted on an annual basis and serve to assess the comprehension and knowledge of the Greek language of students seeking admission to Greek universities. In order to ensure consistency across the texts, the reading materials were selected in accordance with two criteria. Firstly, it was essential that all the selected texts were of an identical length and readability. The texts were of a similar length, with an average of 480 words (ranging from a minimum of 407 to a maximum of 564) and comprised five to six paragraphs (sentences) of approximately 90 words each (with a minimum of 83 and a maximum of 100). Secondly, it was ensured that the selected texts were different from those used in the national university entrance examinations of the participants. The texts were displayed on a computer screen in a large Arial font (21 points) with a line spacing of 1.5. Each text was followed by five true/false questions. The subject areas of the texts and the corresponding questions were sociology, philosophy and education.
The eye-tracking device employed in the experiment was the Tobii Pro X2-60 Hz system, which records the user’s gaze data in real time. The system operates at a sampling rate of 60 Hz and provides a spatial accuracy of <0.5°, making it well-suited for reading research. The computer utilized was a 15.6-inch Dell Vostro 15,500 series monitor, with a resolution of 1024 × 768 pixels and a physical size of 380 × 300 mm. The musical accompaniment was delivered via speakers positioned in front of the participant, adjacent to the monitor. The volume of the background music was standardized to ensure that each participant was able to read the text comfortably. The café noise was a recorded sound from a café, which was then normalized using Adobe Audition 3.0 software. The average volume of the noise was between 60 and 70 dB, with no instances of exceeding 80 dB. The sound was measured while the participants were seated.
2.2. Experimental Protocol
The experiment employed a within-subject experimental design, whereby each participant was exposed to four distinct texts under four different conditions: preferred music, non-preferred music, café noise, and silence. In order to minimize the possibility of carry-over effects, each participant was assigned to receive the reading conditions and texts in a specific order. The particulars of the experimental protocol are described below.
The participants were invited to the Laboratory of Bilingual Education of the Department of Special Education of the University of Thessaly, where the experiment leader provided an explanation of the experiment’s purpose and the process of the study (see
Figure 1 for an overview of the study design). Ensuring confidentiality and privacy is a standard practice to protect research study participants. Therefore, this process involves concealing the participants’ real names and other identifiable characteristics to minimize the risk of exposing their personal information [
32]. Participants were de-identified and assigned a unique ID number (for student male and female participants, e.g., SM1, SF2). They were assured that their personal information would be protected and thus each participant signed a consent form, indicating their understanding of the research aims and confidentiality policies, as well as their agreement to participate in the study. After that, the participants were required to complete a reading comprehension ability test and a pre-survey questionnaire concerning their demographics, as well as their music and reading habits. All participants were instructed to bring a 20–30 min long piece of music that they preferred to listen to while reading. The participants selected a diverse range of musical genres, including instrumental classical music, various forms of modern pop and jazz, and vocal music. Additionally, the participants indicated to the experiment leader which genres of music they did not prefer to listen to while reading. The most commonly cited examples of such music were metal and heavy metal. A comprehensive list of all preferred and non-preferred music genres is provided in
Table A1. The experiment leader selected 20–30 min long pieces of music from these categories that the participants indicated they did not prefer to listen to while reading. These pieces were then brought to the Laboratory for the experiment.
Subsequently, the participants were instructed to assume a seated position in front of the computer and the eye-tracking device. The participants were seated at a distance of 0.40 m from the computer screen, in a manner that was as comfortable as possible. The experiment leader informed the participants that they would be reading four texts while the eye tracker would record their eye movements. Subsequently, they were required to answer to a series of questions of a reading comprehension test (RC Test) designed to evaluate their comprehension of the text they had just read. The experiment leader informed the participants that their eye movements would not be recorded during the responses to the questions. A one-minute intermission was permitted between each reading task. The participants were instructed to read one of the texts while listening to their preferred music, one while listening to non-preferred music, one to recorded noise from a café, and one in complete silence. The participants were not informed of the sequence in which the four reading conditions would be presented. The participants were informed that they could access the texts by using the mouse, and that they were permitted as much time as they required to complete each reading task. To prevent significant discrepancies in overall completion times between the participants, the completion time for each reading task should not exceed 20 min. However, the participants were not informed that their time was being recorded in order to facilitate a sense of comfort and avoid any undue pressure throughout the experiment. Following the completion of the four reading tasks, participants were invited to answer in a survey questionnaire designed to assess their attitudes towards reading.
In regard to the eye-tracking procedure, the participants were informed that a 9-point calibration would be conducted prior to the commencement of each reading task. Calibration accuracy was assessed using deviation errors, with a threshold of <0.5° for acceptance. If the accuracy exceeded this threshold, recalibration was conducted. Additionally, the participants were instructed to maintain a fixed position of the head during the reading tasks. However, they were permitted to move their head freely during the response phase of each text, during which the eye tracker would not record their eye movements. The experiment was conducted in two sessions, with a five-minute rest period between them. In order to facilitate the participants’ comfort with the eye-tracking technology, the calibration process, the reading procedure, and the RC Test, a simulation experiment was conducted during the Session 1. The participants were required to read an example text in one reading condition: reading while listening to preferred music and after that, they were asked to complete a RC test. In the second session, the participants completed the basic experiment as described above. After that, the experiment leader presented each participant with a certificate of participation in the research project and provided them with a summary of the project’s conclusion.
2.3. Subjects
A total of 10 undergraduate students from the University of Thessaly in Greece (
n = 10, comprising 9 females and 1 male) participated voluntarily in this pilot study.
Table A2 presents the participants’ demographic details, as well as their music and reading habits. The mean age of the participants was 22.8 years (SD = 7.2), and they were majoring at the time in the subject of special education. All participants were native speakers of the Greek language and reported having normal or corrected-to-normal vision, with or without the use of contact lenses or glasses.
2.4. Measurements
Prior to the reading tasks, participants completed a survey comprising questions about their demographics, as well as their music and reading habits. The pre-reading survey, comprising 10 questions, was designed to assess eight variables: demographics, reading environment, reading habits, reading performance with background noise and music, self-perception of reading ability with music, reading frequency with music, and music preferences while reading. In the experiment, the visual patterns of the participants were measured in terms of four eye movement measures: fixation duration, fixations count, TTFF and FFD. Moreover, the experiment resulted in the acquisition of two measures of reading performance, which included both reading comprehension and reading time, is evaluated in conjunction with five distinct measures of reading attitudes: perceived difficulty, perceived understanding, perceived confidence, perceived fatigue, and perceived immersion.
2.4.1. Visual Patterns Measurements
The visual patterns were assessed in terms of Fixation Duration (ms), Fixations Count (number of fixations), Time To First Fixation (TTFF) (ms) and First Fixation Duration (FFD) (ms). The measurements were calculated by computing the average value of those performed while the participant read a text on the computer screen. Fixation Duration is the average time spent fixating on a word or area of interest (AOI). Longer fixation duration indicate increased cognitive processing demands. Fixations Count is the total number of fixations recorded per reading task, reflecting the overall level of visual engagement with the text. Time To First Fixation is the time elapsed from text onset until the participant’s first fixation on the text, serving as a measure of attentional allocation efficiency, and First Fixation Duration is the duration of the initial fixation on a word, which is associated with early lexical processing. Fixations falling outside the specified cutoff range were excluded from the analysis. The cutoff range was defined based on the standard deviation, specifically two standard deviations above and below the mean. It ranged from below 100 ms to above 500 ms.
2.4.2. Reading Performance Measurements
The reading performance of the participants was evaluated in terms of their reading comprehension and reading time. To assess reading comprehension, participants were required to answer to an RC Test responding to five questions for each text. The questions were in a true/false format and were designed to assess the general understanding of the text by the participants. A score of one was awarded for each correct answer, with a maximum score of five. Given that each reading task was based on a single reading text, the maximum score for each task was five points. The reading time was calculated as the number of seconds (s) required to read each text, beginning with the onset of eye movement and concluding when the reading stopped. The data pertaining to the reading activities were recorded via the eye-tracking device, while the reading time was recorded and analyzed using the eye-tracking software. The data were validated by the research team.
2.4.3. Reading Attitudes Measurements
In the experiment, participants were invited to answer a survey questionnaire designed to assess their attitudes towards reading. The experiment leader employed a fourth-grade Likert scale to calculate the scores for five perceived reading attitudes: difficulty, understanding, confidence, fatigue, and immersion. The following section provides an explanation of the methodology used to calculate each reader’s attitude score. Firstly, perceived difficulty refers to the level of difficulty associated with each reading task, as evaluated by the participant. This was assessed through the following question: “How difficult did you perceive this reading task to be?”. A score was calculated for the perceived difficulty of reading task. A score was calculated to quantify the perceived difficulty of each reading task, ranging from 1 (very easy) to 4 (very difficult). Secondly, perceived understanding was assessed through the question: “How clearly did you understand the reading text?”. Responses were recorded on a scale from 1 (poorly understood) to 4 (completely understood), providing a measure of the participants’ self-assessed comprehension of the reading task. Thirdly, the levels of perceived confidence were evaluated through the following question: “How confident are you about your text comprehension?”. The scale ranged from 1 (lack of confidence) to 4 (complete confidence). Fourth, perceived fatigue was assessed using two statements: “My eyes became tired as I read the text” and “I felt comfortable reading the text” (reverse-scored). Responses were recorded on a scale from 1 (very unlikely) to 4 (very likely), capturing the extent to which participants experienced fatigue during reading. Similarly, perceived immersion was evaluated through two statements: “I became really involved in reading the text” and “I could not focus on reading the text” (reverse-scored). This scale, ranging from 1 (strongly disagree) to 4 (strongly agree), was employed to measure the degree of engagement participants experienced while reading. The content of the survey questionnaire was based on the relevant literature concerning reading attitudes [
10,
33].
2.5. Data Preprocessing and Analysis
Raw gaze data were collected and processed using Tobii Pro Eye-Tracker Manager (Tobii Pro X2-60) and iMotions 7.1 software to ensure data integrity and consistency. The preprocessing procedure involved multiple steps, including blink removal, where data points containing blinks or missing values were excluded from fixation calculations, and saccade filtering, where eye movements exceeding a velocity of 30°/s were classified as saccades and removed to isolate fixations. Additionally, fixation aggregation was performed by computing and averaging fixation durations and fixations count for each participant and reading condition. The Statistical Package for Social Sciences (SPSS) was utilized to analyze statistical differences across four visual patterns and two reading performance measures under the four auditory conditions: preferred music, non-preferred music, café noise, and silence. Statistical tests were conducted to examine how these four reading conditions (independent variable) influenced the measured dependent variables, which included fixation duration, fixations count, (TTFF), (FFD), reading time, reading comprehension, perceived difficulty, perceived understanding, perceived confidence, perceived fatigue, and perceived immersion. In order to decide on the suitable test to be performed, the assumptions of Analysis of Variance (ANOVA) were initially examined for all the related dependent variables. For each variable, the assumptions assessed included the independence of observations, normality (observations within each status should be normally distributed) and homogeneity of variance (variance of the observations within each status should be equal). In case all the assumptions were met, the ANOVA test is used, otherwise the non-parametric statistical test of Kruskal–Wallis is used to determine if there were statistical differences for the dependent variable between the independent reading statuses. The results of the ANOVA assumptions assessment are summarized in
Table 1. For the variables fixation duration, fixation counts and reading time, all the assumptions were met and ANOVA is used. For the rest of the variables, the normality assumption was violated, therefore the Kruskal–Wallis test was used. All analyses were corrected using the post hoc tests—the Bonferroni post hoc test after an ANOVA has been performed and the ‘Dwass, Steel, Critchlow-Fligner’ test after a Kruskal–Wallis test has been performed.
3. Results
This section presents the results from the statistical analysis of the data obtained from the experiment regarding our hypotheses on visual patterns, reading performance, and reading attitudes.
The mean values and the standard deviations of the examined variables, as well as the F-values and
p-values of the ANOVA analyses, are presented in
Table 2 below. For the visual patterns, ANOVA analysis results for the fixation duration (F(3,36) = 0.16,
p-value = 0.924), fixations count (F(3,36) = 0.66,
p-value = 0.585) and Kruskal–Wallis test for TTFF (X2(3) = 6.02,
p-value = 0.110) and FFD (X2(3)= 3.60,
p-value = 0.309) indicated, as shown in
Figure 2, that there is no statistically significant difference in the variables’ means across the four different study conditions.
Regarding reading performance, for both reading comprehension (X2(3) = 1.47,
p-value = 0.690) and reading time (F(3,36) = 2.25,
p-value = 0.099), the Kruskal–Wallis and ANOVA tests, respectively, revealed, as shown in
Figure 3, that there was not a statistically significant difference between at least two groups.
For the five examined reading attitudes, perceived difficulty (X2(3) = 20.80,
p-value = 0.000), perceived understanding (X2(3) = 25.67,
p-value = 0.000), perceived confidence (X2(3) = 27.62,
p-value = 0.000), perceived fatigue (X2(3) = 8.80,
p-value = 0.032) and perceived immersion (X2(3) = 21.00,
p-value = 0.000), the Kruskal–Wallis test indicated, as shown in
Figure 4, that there is statistically significant difference across preferred music (P), non-preferred music (NP), café noise (C), and silence (S) (
Figure 4).
The “Dwass, Steel, Critchlow-Fligner” (DSCF) post hoc tests for each of the reading attitudes are presented in
Table 3. For perceived difficulty, the non-preferred music had significantly higher mean value compared to silence and preferred music. The larger difference was observed between silence and non-preferred music (DSCF = 5.43,
p = 0.000), followed by the difference between preferred and non-preferred music (DSCF = 4.89,
p = 0.000). For perceived understanding, the significant differences were observed between preferred and non-preferred music, silence and non-preferred music and café noise and non-preferred music. The higher level of understanding was observed for preferred music and silence (mean = 3.1), then for café noise (mean = 2.8) and finally for non-preferred (mean = 1.8). The larger difference was observed between silence and non-preferred music and preferred and non-preferred music (DSCF = 5.25,
p = 0.000). For perceived confidence, the silence and preferred music had the higher levels of confidence, while a lower mean confidence was observed for non-preferred (mean = 2.3). The largest difference was observed between silence and non-preferred music (DSCF = 5.47,
p = 0.000). For perceived fatigue, non-preferred music (mean = 2.3) has significantly higher mean fatigue values compared to silence (mean = 1.6) and café noise (mean = 1.6). The differences are the same for both comparisons (
p = 0.047). For perceived immersion, the significant differences were observed between preferred and non-preferred music, silence and non-preferred music and café noise and non-preferred music. The highest level of immersion was observed for preferred music and silence (mean = 2.2), then for café noise (mean = 1.8), and for non-preferred music (mean = 1.2). The largest difference was observed between preferred and non-preferred music (DSCF = 5.38,
p = 0.001). In conclusion, it is noted that for all reading attitudes, the DSCF post hoc tests indicated no significant difference between preferred music and silence, while the most significant and larger differences were observed between silence and non-preferred music.
4. Discussion
In this study, we used the eye-tracking technology to examine how text comprehension is affected by different reading conditions (preferred music, non-preferred music, café noise and in silence) across the three variables of visual patterns, reading performance and reading attitudes.
4.1. Visual Patterns in Text Comprehension
According to H1, Time To First Fixation (TTFF) is expected to be longer in the non-preferred music and café noise conditions than in the preferred music and silence conditions. This hypothesis was based on the assumption that non-preferred music and café noise would delay visual attention allocation due to increased cognitive load. Contrary to the hypothesis, there was no statistically significant difference in TTFF across the four reading conditions: preferred music, non-preferred music, café noise, and silence. Participants initiated the reading at a consistent pace regardless of the auditory condition, indicating their ability to adapt promptly to the auditory stimuli.
This finding contradicts studies indicating that auditory distractions, especially unpredictable or aversive noise, can delay attention allocation and increase TTFF [
22,
34,
35]. For instance, Tsai et al. [
22] found that a noise-induced cognitive load prolongs fixation time as readers filter out irrelevant stimuli. However, the findings align with studies suggesting that visual attention processes, particularly initial fixation, are largely automatic and less affected by auditory distractions [
13,
23]. Furthermore, the consistent TTFF observed in this study lends support to the notion that visual attention remains stable in the presence of familiar or predictable distractions, as evidenced by prior research in reading in café noise and preferred music conditions [
23].
In conclusion, the observed discrepancy between the results and our initial hypothesis may be attributed to the participants’ familiarity with background noise, as indicated by the 40% response rate indicating frequent use of music while studying (see
Appendix A,
Table A2). Furthermore, the predictable nature of café noise is likely to have reduced its cognitive load, resulting in a reduced disruptive impact. Therefore, H1 was not confirmed.
According to H2, the participants were expected to exhibit longer fixation duration and more fixations count when reading under the non-preferred music and café noise conditions than under the preferred music and silence conditions. This was based on the assumption that non-preferred music and café noise would increase cognitive load, leading to more frequent and prolonged fixations as readers make an effort to compensate for auditory distractions. However, the findings did not support this hypothesis as no statistically significant differences were observed in the fixation duration and fixations count across all four conditions (preferred music, non-preferred music, café noise, and silence). Participants demonstrated consistent visual patterns irrespective of the auditory environment, indicating effective management of distractions and sustained visual focus.
The findings contradict the predictions of the cognitive load theory, which states that auditory distractions increase cognitive demands, resulting in longer fixation duration and higher fixations count as readers process text more carefully [
13,
35,
36]. For example, Rayner [
13] found that challenging tasks, particularly those involving distractions, often require additional visual resources to overcome external challenges.
However, the results are consistent with studies suggesting that predictable noise, such as café noise, exerts minimal impact on visual patterns compared to unpredictable or fluctuating auditory distractions [
34]. The findings further corroborate the notion that individuals with experience in handling background noise or environments devoid of distractions possess the capacity to regulate their attention effectively [
13]. The participants’ inclination towards instrumental or low-distraction music (55.6%) and their frequent engagement in study activities accompanied by background music (40%), as shown in
Table A2, are likely to have contributed to their ability to maintain consistent visual patterns.
The observed discrepancy between the results and our initial hypothesis could be attributed to the relatively uncomplicated reading tasks used in this study, which did not impose a significant cognitive load on the participants. Additionally, the participants’ familiarity with the auditory environments may have also reduced the impact of the distraction caused by non-preferred music and café noise. Therefore, H2 was not confirmed.
According to H3, First Fixation Duration (FFD) is not expected to differ significantly across the four reading conditions. This prediction was based on the assumption that FFD, as a measure of initial visual engagement, reflects automatic, low-level processing that is less susceptible to external auditory distractions. FFD did not vary significantly across the four reading conditions, indicating that auditory distractions, such as non-preferred music and café noise, do not affect the participants’ initial engagement with the text.
These findings are consistent with those of the existing research. Rayner [
13] claimed that FFD primarily represents automatic processes such as word recognition and attention allocation, which are robust against environmental distractions. In addition, Tsai et al. [
22] found that FFD remains stable in predictable environments, such as those of café noise, where distractions are less intrusive. Furthermore, Haapakangas et al. [
34] observed that low-level auditory distractions minimally influence early visual processing stages, though they may affect later cognitive engagement. Additionally, Reichle et al. [
23] observed that FFD remains stable unless readers encounter significantly complex or unfamiliar text. Our results align with the broader consensus that FFD reflects low-level, automatic processing unaffected by auditory distractions in familiar or predictable environments. Therefore, H3 was confirmed and the findings suggest that FFD is a robust measure of early visual processing that remains stable across auditory conditions, particularly when tasks are relatively simple and readers are familiar with the auditory environment.
4.2. Reading Performance in Text Comprehension
According to H4, participants are expected to achieve higher reading comprehension scores in the preferred music and silence conditions than in the non-preferred music and café noise conditions. This prediction was based on the idea that non-preferred music and café noise would increase cognitive load, impairing text comprehension, while preferred music and silence would allow for better cognitive focus and processing. The participants’ reading comprehension scores were statistically significantly lower in the non-preferred music condition than the other conditions, supporting the hypothesis that non-preferred music increases cognitive load and has a negative impact on text comprehension. However, no statistically significant difference was observed between comprehension scores obtained for the preferred music, café noise, and silence conditions. This suggests that preferred music and silence allow for comparable reading performance, while also café noise did not impair comprehension at a statistically significant level.
The findings align with those of several studies that highlight the disruptive effects of non-preferred music and the cognitive benefits of silence or preferred music. Many studies reported that non-preferred music impairs cognitive performance, while silence and preferred music enhance comprehension [
1,
8,
28]. Haapakangas et al. [
34] and Sweller et al. [
35] claimed that silence reduces cognitive load, enabling better focus and comprehension. Johansson et al. [
1] observed better comprehension under preferred music or silence compared to non-preferred or loud music. Haapakangas et al. [
34] and Que et al. [
6] suggested that moderate, predictable noise, such as café noise, has less detrimental effects than irregular, disruptive noise, which aligns with this study’s finding of no significant impact from café noise.
While this study found no statistically significant difference in comprehension between café noise, preferred music, and silence conditions, some prior studies indicate that café noise can impair cognitive performance. Perham and Sykora [
8] emphasized that background noise, including café noise, often disrupts cognitive tasks by increasing cognitive load. In the same vein, Sweller et al. [
35] suggested that auditory distractions, including café noise, could overwhelm cognitive systems. The lack of a statistically significant effect of the café noise in this study may be explained by the fact that a number of participants (20% as shown in
Table A2) reported familiarity with studying in environments with background noise. Familiarity is likely to have reduced the disruptive potential of café noise, allowing participants to adapt and maintain comprehension. Therefore, H4 was partially confirmed.
According to H5, reading time is expected to be longer in the non-preferred music and café noise conditions than in the preferred music and silence conditions. Our results showed that although reading time in the non-preferred music condition was slightly longer than in other conditions, the difference was not statistically significant. This means that participants were able to maintain a consistent reading pace despite the presence of non-preferred music.
The findings of this study partially align with the existing literature. A number of studies have shown that background music, particularly music that individuals do not prefer, can impede cognitive performance, leading to longer reading times [
3,
37]. However, other research studies suggest that the effect of noise, including non-preferred music, may vary depending on the individual’s ability to concentrate or the complexity of the reading task [
38]. In this study, the non-statistically significant difference in reading time among preferred music, café noise, and silence conditions indicates that the participants may have employed effective strategies to alleviate the potential distractions. Therefore, H5 was not confirmed.
4.3. Reading Attitudes in Text Comprehension
According to H6, participants are expected to report higher perceived difficulty and fatigue when reading under the non-preferred music and café noise conditions than reading under the preferred music and silence conditions. This hypothesis is based on the premise that non-preferred music and café noise impose a higher cognitive load than the other three reading conditions preferred music and silence, making the reading task more mentally demanding. Participants perceived the non-preferred music and café noise conditions as significantly more challenging and fatiguing than the preferred music and silence conditions. This indicates that non-preferred music and café noise are associated with increased cognitive load, causing participants to exert more mental effort during the reading task. Descriptive statistics revealed that the non-preferred music conditions frequently involved genres such as metal and heavy metal, characterized by high tempo, loud volume, and unpredictable rhythm. These attributes are likely to have disrupted focus and heightened cognitive demands, leading to increased fatigue and difficulty. In addition, as shown by the descriptive statistics, the café noise condition included background sounds typically found in public environments, such as background chatter, clinking utensils and intermittent auditory distractions, which had a negative impact on the participants’ studying/reading. The unpredictable nature of these sounds may have increased the participants’ cognitive load and disrupted their reading focus, contributing to their increased perceptions of perceived difficulty and fatigue. Conversely, preferred music and silence created more conducive environments for maintaining focus and reducing perceived mental effort.
The findings align with those of the existing literature. Cognitive load theory suggests that distractions such as non-preferred music and café noise impose additional cognitive demands, diverting attention from the primary task and causing increased fatigue and difficult [
35]. Perham and Sykora [
8] reported that non-preferred music disrupts cognitive tasks by requiring individuals to exert more effort to filter out auditory distractions, leading to perceived fatigue and difficulty. The present findings that low levels of fatigue and difficulty did present in some of the participants under the preferred music condition align with those of several other studies that demonstrated that preferred music fosters positive emotional states, reduces stress, and alleviates perceived task difficulty of lower fatigue and difficulty in the preferred music condition [
28]. Similarly, café noise, although often perceived as ambient and less intrusive, can still disrupt reading performance, especially when it involves background chatter, clinking sounds and variable auditory inputs [
34]. Studies suggest that the inconsistency and unpredictability of café noise forces readers to allocate additional cognitive resources to maintain focus, leading to increased perceptions of difficulty and fatigue [
39]. Despite this, café noise may be less detrimental than non-preferred music due to its more consistent acoustic profile and participants’ potential familiarity with such environments. Therefore, H6 was confirmed.
According to H7, the participants are expected to report higher perceived understanding, confidence, and immersion when reading under the preferred music and silence conditions than under the non-preferred music and café noise conditions. The results of the study showed that the participants reported higher levels of perceived understanding, confidence, and immersion in the preferred music and silence conditions. These environments supported cognitive engagement and task mastery, as evidenced by higher ratings in these areas. Perceived understanding, confidence and immersion were statistically significantly greater in the preferred music and silence conditions than non-preferred music and café noise conditions, reflecting the participants’ ability to focus and engage deeply with the reading task in familiar or quiet auditory settings. Conversely, non-preferred music and café noise were associated with low ratings across all three measures, highlighting its disruptive effects on focus and task engagement.
The findings align with those in the existing literature supporting the positive impact of preferred music and silence on cognitive performance and subjective task engagement. Hallam et al. [
28] demonstrated that preferred music facilitates task engagement and immersion by fostering positive emotional states and reducing anxiety. The present findings align with those of several studies which demonstrated that the participants felt more confident and immersed when listening to their preferred music. Sweller et al. [
35] proposed that silence reduces extraneous cognitive load, enabling a better allocation of cognitive resources to the task. This enhances understanding, confidence, and immersion, as observed in this study. The disruptive effects of non-preferred music are consistent with the findings of Perham and Sykora [
8], who demonstrated that unpredictable rhythms and a high intensity increase cognitive load, hindering focus and task confidence. In addition, Haapakangas et al. [
34] support the notion that café noise, while distracting, is less disruptive than non-preferred music but does not foster the same level of task engagement as silence or preferred music. Therefore, H7 was confirmed.
5. Conclusions
The findings of this study showed that visual measures, such as fixation duration, fixations count, TTFF and FFD, remained stable across the experimental conditions. This finding aligns with prior research that has highlighted the stability of early-stage visual processing in predictable environments [
13,
22]. The familiarity with background noise may have enabled the participants to maintain their focus, even when faced with non-preferred or noisy conditions [
40].
Furthermore, text comprehension was significantly impaired in the non-preferred music condition, in accordance with the cognitive load theory proposed by Sweller et al. [
35]. This theory posits that stimuli perceived as negative and aversive result in increased cognitive demands. In contrast, comprehension scores remained stable in the preferred music, café noise, and silence conditions, indicating that predictable or familiar auditory environments can reduce the disruptive effects of noise [
34]. For example, participant SF1 used to listen to pop music while reading, a genre that was familiar to her and did not hinder the reading process, as reflected in her high performance on the comprehension test. On the contrary, metal music, which she chose to listen to in the non-preferred music condition, was a genre she was not familiar with and did not listen to while reading in the past. As a result, her performance was low under this condition. These findings are in accordance with prior studies which have demonstrated that preferred music and silence foster conditions conducive to learning [
1].
Moreover, the participants of this study indicated that the non-preferred music condition was perceived as significantly more challenging and fatiguing, thereby confirming the negative impact of distracting noise on task engagement [
8]. In contrast, the participants reported improved understanding, confidence, and immersion when exposed to preferred music and silence, which is in accordance with the positive effects of familiar auditory environments on emotional and cognitive states [
28]. Although café noise was found to be distracting, it was observed to have a reduced negative impact, which is likely attributable to its consistent and predictable characteristics [
39]. These findings highlight the significance of aligning the study environments with individual preferences. It can be concluded that preferred music and silence represent optimal auditory environments for reading, promoting both cognitive engagement and emotional well-being. In contrast, the presence of non-preferred music was found to significantly increase cognitive load and fatigue, underscoring the necessity to avoid aversive auditory stimuli during study sessions. The ability of participants to adapt to café noise indicates that consistent background sounds may be tolerable, particularly for those who are used to studying in public spaces.
This study contributes to the existing literature on the subject in a number of ways. Firstly, the reading conditions of preferred music, café noise, and silence selected in this study are commonly used by students, thus the findings offer insight into how reading habits affect reading performance. Moreover, while previous studies have focused on a single type of music or noise background, contrasting it only with reading sessions conducted in silence [
41,
42], this study presents four reading conditions, each of which is compared with the silent reading condition. Additionally, this study specifically examined the role of personal music preferences in reading, distinguishing between preferred and non-preferred music conditions, while previous research has primarily focused on factors such as the nature of the reading tasks, the type of music played, and the personality of the individuals involved [
5,
7,
8]. Furthermore, the eye-tracking methodology employed in this study, in which reading comprehension performance was assessed across four reading conditions, represents a significant contribution to the existing literature on the subject, since there is a lack of research investigating text comprehension while listening to music and tracking readers’ eye movements simultaneously.
However, the results of this study are not in accordance with the body of research that claims that text comprehension is enhanced in the presence of music rather than in its absence [
1,
6]. This discrepancy may be attributed to the several limitations of our study. To date, no studies have been conducted examining text comprehension and eye movements while listening to different kinds of music and noise in the context of the Greek language. Unlike many previous studies that examined reading comprehension in English, this research focused on Greek-language texts, which may involve different syntactic structures, lexical properties, and cognitive processing demands. Furthermore, cultural reading habits and educational background may influence how Greek students engage with texts in the presence of background music.
Cultural background plays a significant role in reading habits, cognitive engagement, and text interpretation. Different cultures emphasize distinct reading strategies, such as rote memorization, inferential reasoning, or critical analysis. In the context of this study, Greek university students may approach reading differently from students in English-speaking countries, where previous research on background music and reading comprehension has been conducted. Additionally, exposure to background noise varies across cultures-some students may be accustomed to studying in noisy environments (e.g., cafés, communal study spaces), while others prefer quiet, isolated settings. These factors could influence how participants respond to auditory distractions like music or café noise during reading tasks.
Linguistic structure plays a crucial role in reading comprehension, influencing how individuals process and extract meaning from text. Greek and English differ significantly in morphology, syntax, and orthographic depth, which may affect reading strategies and cognitive load. Greek is a highly inflected language, where word endings change based on grammatical case, number, and tense, requiring readers to engage in more detailed morphological processing than in English. Additionally, Greek has a shallow orthography, meaning there is a strong correspondence between letters and sounds, whereas English has a deep orthography, with irregular spelling rules requiring greater reliance on whole-word recognition. These differences suggest that Greek readers may depend more on phonological processing, while English readers rely on lexical familiarity and contextual cues. Furthermore, Greek syntax allows for a flexible word order, which may require greater cognitive effort in sentence parsing, potentially influencing reading comprehension under auditory distractions. These linguistic differences highlight the need for cross-linguistic comparisons to determine whether background music and noise affect reading comprehension in a language-specific or universal manner.
Furthermore, the present study was conducted with a limited number of participants, namely university students in Greece and an underrepresentation of males in the sample. As a result, the findings cannot be generalized to readers of other languages or cultural contexts. This study was conducted as a pilot investigation with a small sample size (n = 10), which limits the generalizability of the findings. While no formal power analysis was performed prior to data collection, the sample size was determined based on feasibility constraints and the exploratory nature of the research. While we acknowledge its limitations, the primary objective was to test the experimental procedure and to refine the methodology for a subsequent main study. This future study will include a larger sample with a balanced number of males and females and a more comprehensive analysis, which will enhance the generalizability of the findings.
It is therefore recommended that future research in this field should focus on a number of additional details concerning the type of music played while reading, as well as how specific characteristics of music (e.g., tempo, lyrics, genre) and noise variability affect reading performance and cognitive load. For example, future research could examine the effects of progressively increasing noise levels during the reading process, as well as the differential impact of exposure to speech-based versus instrumental music. Furthermore, future studies could investigate the impact of eye movements on the syntactic and semantic aspects of the texts in greater detail and employ larger sample sizes in order to increase statistical power and improve the generalizability of findings. A within-subject experimental design would allow for a more controlled comparison across different auditory conditions, reducing variability due to individual differences in reading ability and music sensitivity.
Additionally, controlled exposure to varying noise levels could help determine whether background noise intensity (e.g., low, moderate, or high decibel levels) influences reading performance differently. Manipulating specific acoustic properties—such as music tempo, complexity, familiarity, or instrumental versus vocal content—could provide a more detailed understanding of how auditory input interacts with reading comprehension. Furthermore, cross-linguistic comparisons across languages with different orthographic and syntactic structures would help clarify whether the effects of background music and noise are language-specific or universal. These methodological refinements would contribute to a more comprehensive model of how auditory environments influence reading processes.