A Field Study of the Impact of Indoor Lighting on Visual Perception and Cognitive Performance in Classroom

: In this ﬁeld study, a series of psychophysical tests were conducted to investigate the impact of indoor lighting on students’ visual perception and cognitive performance. A typical classroom of Wuhan University (China) was ﬁtted with tubular light-emitting diode (LED) sources and LED panel sources in two experiments, respectively. Under the two lighting environments, seventy-nine college students were invited to complete a group of visual tests, which included colour preference evaluations of fruit and vegetables and skin tone, perceptual judgement on the atmosphere of the lighting environment, a reading comfort assessment concerning di ﬀ erent paper colours, a Karolinska Sleepiness Scale (KSS) test quantifying alertness, and the Anﬁmov test of attention (also concerning paper colours). These tests were conducted twice, before and after a two-hour self-study under each lighting environment, with the aim of investigating the impact of visual fatigue on visual perception and cognitive performance. It was found that the inﬂuence of indoor lighting was signiﬁcant on skin preference and atmosphere perception, while no measured e ﬀ ects of lighting were observed on participant’s alertness and attention. Meanwhile, the impact of visual fatigue was also found to be insigniﬁcant in this case. Interestingly, paper colour, rather than indoor lighting, was found to have a signiﬁcant impact on the visual comfort of text reading. In addition, degree of proﬁciency signiﬁcantly inﬂuenced the proofreading speed and accuracy of the subjects the in Anﬁmov test, which we believe should be taken into consideration when implementing similar tests in follow-up studies. The preliminary ﬁndings of this ﬁeld study should provide a deeper understanding of how changes in classroom lighting contribute to visual perception and cognitive performance of occupants. justiﬁed


Introduction
In the past century and a half, many research projects have investigated the effect of lighting on the perception of the environment, contributing to a comprehensive understanding of the mechanism of visual perception. Studies aimed at finding a correlation between the spectral power distribution (SPD) of light and human visual perception, including colour fidelity [1,2], colour naturalness [3,4], colour discrimination [5,6], colour preference [7][8][9], lighting comfort [10,11], and the whiteness of lighting [12,13], have obtained positive results. More than fifty colour quality metrics involving objective and subjective aspects of perception have been proposed, including the Colour Rendering Index (CRI: Ra) developed by Commission Internationale de l'Eclairage (CIE) [1], Colour Quality Scale (CQS: Qa, Qf, Qp, Qg) [2] developed by National Institute of Standards and Technology (NIST), IES-TM-30 metrics (Rf and Rg) [14] developed by American National Standards Institute (ANSI) & Illuminating Engineering Society of North America, Memory Colour Rendering Index (MCRI) [15], Colour Discrimination Index (CDI) [16], and Gamut Volume Index (GVI) [17]. These metrics are essential and effective tools to inform good professional practice, for example in applied lighting design and product development in museums and art galleries [11,18], surgical applications [19,20], industrial inspection [20], and the retail environment [5,21].
In 2001, a third type of photoreceptor, the intrinsic photosensitive retinal ganglion cell (ipRGC), was discovered [22,23]. These novel cells are located in the retina of mammals and they have been shown to have special nerve connections to the suprachiasmatic nucleus (SCN), which is the biological clock in the brain. The SCN, in turn, has a nerve connection with the pineal gland, responsible for the regulation of some types of hormones. The discovery of additional nerve connections from the ipRGC in the eye to the brain has prominently contributed to the understanding of many non-visual biological effects of lighting, for example, circadian rhythms [24,25], cortisol production [26], melatonin production [27], concentration [28,29], and alertness [27,30].
Over recent decades, researchers, lighting designers, and architects have realized that lighting design should be required to consider both visual and non-visual aspects of lighting [31][32][33][34]. Based on such a scientific consensus, researchers have recently focused on the concept of human-centric lighting (HCL) [35][36][37][38][39], which aims to maximize the benefits of proper lighting and covers a broad set of technologies. Specifically, HCL provides evidence-based lighting solutions optimized for vision, performance, alertness, and general human health, in the way of balancing the visual, emotional, and biological benefits of lighting. In a recent work by Vinh et al. [35], taking selected spectral aspects of HCL into consideration, a preliminary usefulness metric was proposed to characterize the energy efficiency of light sources. This HCL metric comprised widespread used measures of colour quality (e.g., CRI Ra), descriptors of brightness (e.g., CIE L mes [40]), as well as an indicator of the circadian effect (the so-called DIN melanopic factor [41]). Additionally, in the latest study by Duong et al. [38] that concentrated on the up-lighting method and freeform optics, a design of freeform lenses for HCL was developed. It was proved to be beneficial in obtaining the maximum luminous uniformity over an indoor space, which might result in less chance of myopia and improvement in health and mental wellness.
It is necessary, however, to recognize the limitations of what we know at the current stage [37]. For the visual effects of lighting, we know much about how light affects our visual capabilities and perceptions but, due to the variations in psychophysical studies, a consensus has not been reached in those topics. For the non-visual effects, we suspect that many parts of the human body are influenced by light, yet these are relatively unexplored. Thus, as has been announced by many researchers [34,35,37,42], a full range of lighting research, in both visual and non-visual domains, is still needed.
In this contribution, therefore, a field study has been carried out with the aim of exploring the impact of classroom lighting on the visual perception and cognitive performance of the occupants. Two types of luminaires, tubular LED and LED panel sources, were consecutively installed in a classroom in two experiments, both of which employed the same experimental protocol. Seventy-nine college students were recruited to conduct a series of visual evaluations and cognitive tasks under the two experimental lighting conditions. The tasks were conducted twice for every lighting environment, before and after a 2-h period of self-study, which intended to investigate the impact of visual fatigue. According to recent studies, inappropriate selection of paper colour may lead to visual fatigue [43], asthenopia [44], and even some ocular diseases (e.g., keratitis and conjunctivitis [45,46]). Thus, investigating the impact of paper colour (white, yellowish-white, and green) on reading comfort and attention under different light sources was another purpose of this work. In addition, inspired by the error score variations found in the Farnsworth-Munsell 100 Hue Test (a typical measure of colour discrimination capability) caused by test repetition [47,48], we also investigated whether there were practice effects (i.e., impact of degree of proficiency) in the adopted psychophysical approaches.

Experimental Design
The experiments were approved by the ethics committee of the School of Printing and Packaging, Wuhan University. The project identification code is 250000210 (approved on 20 October 2019). Informed consent was obtained from all participants for being included in the study.
In this study, two experiments were consecutively conducted in one classroom at Wuhan University in Wuhan, China (30.53 • N, 114.36 • E). Both experiments were designed within the same experimental protocol, but with different luminaires (Experiment 1: tubular LED sources; Experiment 2: LED panel sources). Seventy-nine students (person-time) from Wuhan University were invited to participate in the research. All the experiments were conducted after 6:30 p.m. to avoid unintended variables caused by the presence of daylight.
Each experiment involved Sessions 1, 2, and 3, as shown in Figure 1. In Session 1, participants were asked to complete six tasks related to visual perception and cognitive performance. The first was the colour preference evaluation of a plate of multicoloured fruit and vegetables, followed by the visual evaluation of skin tone preference. These approaches are widely adopted in the measurement of colour quality of light sources [3,7,9]. The third task was to assess the atmosphere of the indoor lighting environment in terms of four attributes: Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed. Next, the participants were asked to complete the proofreading task of the Anfimov test [49][50][51][52]. The visual comfort assessment of text reading associated with three differently coloured pieces of paper (white, yellowish-white, and green) was conducted next. The last task was the Karolinska Sleepiness Scale (KSS) test [53]. The details of the six tasks will be described in Section 2.4.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 17 measure of colour discrimination capability) caused by test repetition [47,48], we also investigated whether there were practice effects (i.e., impact of degree of proficiency) in the adopted psychophysical approaches.

Experimental Design
The experiments were approved by the ethics committee of the School of Printing and Packaging, Wuhan University. The project identification code is 250000210 (approved on 20 October 2019). Informed consent was obtained from all participants for being included in the study.
In this study, two experiments were consecutively conducted in one classroom at Wuhan University in Wuhan, China (30.53° N, 114.36° E). Both experiments were designed within the same experimental protocol, but with different luminaires (Experiment 1: tubular LED sources; Experiment 2: LED panel sources). Seventy-nine students (person-time) from Wuhan University were invited to participate in the research. All the experiments were conducted after 6:30 p.m. to avoid unintended variables caused by the presence of daylight.
Each experiment involved Sessions 1, 2, and 3, as shown in Figure 1. In Session 1, participants were asked to complete six tasks related to visual perception and cognitive performance. The first was the colour preference evaluation of a plate of multicoloured fruit and vegetables, followed by the visual evaluation of skin tone preference. These approaches are widely adopted in the measurement of colour quality of light sources [3,7,9]. The third task was to assess the atmosphere of the indoor lighting environment in terms of four attributes: Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed. Next, the participants were asked to complete the proofreading task of the Anfimov test [49][50][51][52]. The visual comfort assessment of text reading associated with three differently coloured pieces of paper (white, yellowish-white, and green) was conducted next. The last task was the Karolinska Sleepiness Scale (KSS) test [53]. The details of the six tasks will be described in Section 2.4. In Session 2, the participants were instructed to conduct a 2-h period of self-study in the classroom, during which only paperwork with white paper was allowed, i.e., no light-emitting devices (e.g., computers, tablets, phones) were allowed. Such an arrangement was to make sure that the only light in the classroom came from the experimental luminaires. It should be noted that this study was carried out in December 2019, when the final exams of the university were approaching. Thus, the participants were able to concentrate on revision for their final exams, e.g., reviewing textbooks and doing exercises. Session 3, which adopted the same experimental procedure as Session 1, was performed after the 2-h self-study period. Thus, each participant completed the six visual tests twice under the same lighting environment, before and after the self-study period, respectively. As noted above, this protocol aimed to investigate the impact of visual fatigue on the results of the tests. In addition, one group of experienced participants who had engaged in Experiment 1 and a second group of In Session 2, the participants were instructed to conduct a 2-h period of self-study in the classroom, during which only paperwork with white paper was allowed, i.e., no light-emitting devices (e.g., computers, tablets, phones) were allowed. Such an arrangement was to make sure that the only light in the classroom came from the experimental luminaires. It should be noted that this study was carried out in December 2019, when the final exams of the university were approaching. Thus, the participants were able to concentrate on revision for their final exams, e.g., reviewing textbooks and doing exercises.
Session 3, which adopted the same experimental procedure as Session 1, was performed after the 2-h self-study period. Thus, each participant completed the six visual tests twice under the same lighting environment, before and after the self-study period, respectively. As noted above, this protocol aimed to investigate the impact of visual fatigue on the results of the tests. In addition, one group of experienced participants who had engaged in Experiment 1 and a second group of participants without such experience were both involved in Experiment 2. This arrangement aimed to verify any possible practice effects on the results of the psychophysical tests. Strictly speaking, the impact of visual fatigue revealed by any comparison between the results of Session 1 and Session 3 was confounded by the impact of another form of practice effect, that is, the instant experience obtained in Session 1. In this study, we assumed the instant practice effect between Session 1 and Session 3 to be negligible according to the insignificant difference between experienced participants and inexperienced participants in Experiment 2 (see Section 3). Therefore, only the possible impact of visual fatigue caused by the 2-h self-study between Session 1 and Session 3 is discussed in the following sections. In total, it took approximately three and a half hours to conduct each experiment.

Experimental Setup
All the experiments took place in a classroom ( Figure 2) at Wuhan University, which was equipped with tubular LED and LED panel sources for Experiment 1 (Exp.1) and Experiment 2 (Exp.2), respectively. Specifically, six classic LED tubes were installed at a height of 3 m for Exp.1, and nine commercially available LED panels were installed at the same height for Exp.2. An X-Rite i1 Pro2 spectrophotometer (measurement geometry: 45 • /0 • ; spectral range: 380-730 nm; wavelength step: 10 nm; measuring aperture: 3.5 mm) was used to measure the SPDs. Before each experiment, all the luminaires were switched on for at least 30 min to ensure they had stabilized. After that, an X-Rite i1 Pro2, which had been calibrated by measuring a calibration white tile, was placed on the measurement plane (2 m directly below the light sources) with a standard white tile in the geometry of 45 • /0 • to obtain SPDs. The SPDs of the six tubular LED sources showed some small variation, probably because they were not new. The nine LED panel sources were previously unused and had consistent SPDs, as shown in Figure 3.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 17 participants without such experience were both involved in Experiment 2. This arrangement aimed to verify any possible practice effects on the results of the psychophysical tests. Strictly speaking, the impact of visual fatigue revealed by any comparison between the results of Session 1 and Session 3 was confounded by the impact of another form of practice effect, that is, the instant experience obtained in Session 1. In this study, we assumed the instant practice effect between Session 1 and Session 3 to be negligible according to the insignificant difference between experienced participants and inexperienced participants in Experiment 2 (see Section 3). Therefore, only the possible impact of visual fatigue caused by the 2-h self-study between Session 1 and Session 3 is discussed in the following sections. In total, it took approximately three and a half hours to conduct each experiment.

Experimental Setup
All the experiments took place in a classroom ( Figure 2) at Wuhan University, which was equipped with tubular LED and LED panel sources for Experiment 1 (Exp.1) and Experiment 2 (Exp.2), respectively. Specifically, six classic LED tubes were installed at a height of 3 m for Exp.1, and nine commercially available LED panels were installed at the same height for Exp.2. An X-Rite i1 Pro2 spectrophotometer (measurement geometry: 45°/0°; spectral range: 380-730 nm; wavelength step: 10 nm; measuring aperture: 3.5 mm) was used to measure the SPDs. Before each experiment, all the luminaires were switched on for at least 30 min to ensure they had stabilized. After that, an X-Rite i1 Pro2, which had been calibrated by measuring a calibration white tile, was placed on the measurement plane (2 m directly below the light sources) with a standard white tile in the geometry of 45°/0° to obtain SPDs. The SPDs of the six tubular LED sources showed some small variation, probably because they were not new. The nine LED panel sources were previously unused and had consistent SPDs, as shown in Figure 3.  The classroom is located on the ground floor of a four-story building and has an area of 101.2 m 2 (11.5 × 8.8 m). It can accommodate 36 students with partitions between each of the desks, ensuring participants without such experience were both involved in Experiment 2. This arrangement aimed to verify any possible practice effects on the results of the psychophysical tests. Strictly speaking, the impact of visual fatigue revealed by any comparison between the results of Session 1 and Session 3 was confounded by the impact of another form of practice effect, that is, the instant experience obtained in Session 1. In this study, we assumed the instant practice effect between Session 1 and Session 3 to be negligible according to the insignificant difference between experienced participants and inexperienced participants in Experiment 2 (see Section 3). Therefore, only the possible impact of visual fatigue caused by the 2-h self-study between Session 1 and Session 3 is discussed in the following sections. In total, it took approximately three and a half hours to conduct each experiment.

Experimental Setup
All the experiments took place in a classroom ( Figure 2) at Wuhan University, which was equipped with tubular LED and LED panel sources for Experiment 1 (Exp.1) and Experiment 2 (Exp.2), respectively. Specifically, six classic LED tubes were installed at a height of 3 m for Exp.1, and nine commercially available LED panels were installed at the same height for Exp.2. An X-Rite i1 Pro2 spectrophotometer (measurement geometry: 45°/0°; spectral range: 380-730 nm; wavelength step: 10 nm; measuring aperture: 3.5 mm) was used to measure the SPDs. Before each experiment, all the luminaires were switched on for at least 30 min to ensure they had stabilized. After that, an X-Rite i1 Pro2, which had been calibrated by measuring a calibration white tile, was placed on the measurement plane (2 m directly below the light sources) with a standard white tile in the geometry of 45°/0° to obtain SPDs. The SPDs of the six tubular LED sources showed some small variation, probably because they were not new. The nine LED panel sources were previously unused and had consistent SPDs, as shown in Figure 3.  The classroom is located on the ground floor of a four-story building and has an area of 101.2 m 2 (11.5 × 8.8 m). It can accommodate 36 students with partitions between each of the desks, ensuring The classroom is located on the ground floor of a four-story building and has an area of 101.2 m 2 (11.5 × 8.8 m). It can accommodate 36 students with partitions between each of the desks, ensuring that there was no interference amongst participants during the tests. The illuminance at the centre of each desk was measured with a Testo 540 illuminance meter (measurement range: 0-99,999 lux, measurement accuracy: ±3%), as shown in Table 1. The measuring devices adopted in this study were recalibrated 15 days before the experiments. All the photometric and colorimetric parameters are based on measurements taken prior to the start of formal experiments. Figure 4 visualizes colour gamut properties of the experimental sources. that there was no interference amongst participants during the tests. The illuminance at the centre of each desk was measured with a Testo 540 illuminance meter (measurement range: 0-99,999 lux, measurement accuracy: ±3%), as shown in Table 1. The measuring devices adopted in this study were recalibrated 15 days before the experiments. All the photometric and colorimetric parameters are based on measurements taken prior to the start of formal experiments. Figure 4 visualizes colour gamut properties of the experimental sources.

Subjects
A total of seventy-nine volunteers (person-time) were invited to participate in this study. All the subjects were Chinese students at Wuhan University, and had passed the Ishihara Colour Vision Test. Generally, they were mainly from eastern (n = 30) and southern (n = 23) China, with a few (n = 3) from western China. In Exp.1, there were 36 subjects-18 males and 18 females-aged between 17 and 19 years (mean = 18.2 years, SD = 0.60). None were aware of the purpose of the research.
In Exp.2, a group of 23 subjects (Group A, 11 males and 12 females, aged between 17 and 19 years, mean = 17.9 years, SD = 0.62) who had taken part in Exp.1 were recruited again. In addition, a second group of 20 naïve subjects (Group B, 5 males and 15 females, aged between 16 and 26 years; mean = 22.0 years, SD = 2.48) were invited to participate. Since there were only 36 desks in the classroom, the two groups of subjects participated in Exp.2 on two separate nights. To be specific, Exp.2 with subjects from Group A was conducted one week after Exp.1, while the subjects in Group B participated in Exp.2 two days later. As noted above, this arrangement was aimed at investigating the possible impact of any practice effect (or degree of proficiency) upon the results of the psychophysical tests.

Subjects
A total of seventy-nine volunteers (person-time) were invited to participate in this study. All the subjects were Chinese students at Wuhan University, and had passed the Ishihara Colour Vision Test. Generally, they were mainly from eastern (n = 30) and southern (n = 23) China, with a few (n = 3) from western China. In Exp.1, there were 36 subjects-18 males and 18 females-aged between 17 and 19 years (mean = 18.2 years, SD = 0.60). None were aware of the purpose of the research.
In Exp.2, a group of 23 subjects (Group A, 11 males and 12 females, aged between 17 and 19 years, mean = 17.9 years, SD = 0.62) who had taken part in Exp.1 were recruited again. In addition, a second group of 20 naïve subjects (Group B, 5 males and 15 females, aged between 16 and 26 years; mean = 22.0 years, SD = 2.48) were invited to participate. Since there were only 36 desks in the classroom, the two groups of subjects participated in Exp.2 on two separate nights. To be specific, Exp.2 with subjects from Group A was conducted one week after Exp.1, while the subjects in Group B participated in Exp.2 two days later. As noted above, this arrangement was aimed at investigating the possible impact of any practice effect (or degree of proficiency) upon the results of the psychophysical tests.

Experimental Procedure
The procedure for Exp.1 and Exp.2 is shown in Figure 1 above. Before the formal experiments, all the luminaires were switched on for at least 30 min to ensure they had stabilized. Upon arrival, the subjects randomly chose a seat and signed an informed consent form. A general information questionnaire was issued to collect demographic and lighting knowledge information. During that time, the experimenter read the instructions and answered any questions raised by the subjects.
The subjects then consecutively performed the six tasks in Session 1. The first task was to observe a plate of multicoloured fruit and vegetables (including an apple, an orange, a lemon, an onion, a green date, a cucumber, and some cherry tomatoes; Figure 5), and make a subjective preference rating. Note that the fruit and vegetables adopted in Exp.1 and Exp.2 were not the same due to the timescale of the experiments affecting the long-term preservation of the items. In Exp.2, therefore, the fruit and vegetables were selected to be as similar to those use in Exp.1 as possible. A 7-point categorical judgement method was employed to quantify the subjective preference of the illuminated objects. Specifically, the seven scores (−3, −2, −1, 0, 1, 2, 3) corresponded to strongly dislike, moderately dislike, slightly dislike, neutral, slightly like, moderately like, and strongly like, respectively. This method was also adopted for the preference evaluation of the skin tone of the back of the hand of each subject, as shown in Figure 5.

Experimental Procedure
The procedure for Exp.1 and Exp.2 is shown in Figure 1 above. Before the formal experiments, all the luminaires were switched on for at least 30 min to ensure they had stabilized. Upon arrival, the subjects randomly chose a seat and signed an informed consent form. A general information questionnaire was issued to collect demographic and lighting knowledge information. During that time, the experimenter read the instructions and answered any questions raised by the subjects.
The subjects then consecutively performed the six tasks in Session 1. The first task was to observe a plate of multicoloured fruit and vegetables (including an apple, an orange, a lemon, an onion, a green date, a cucumber, and some cherry tomatoes; Figure 5), and make a subjective preference rating. Note that the fruit and vegetables adopted in Exp.1 and Exp.2 were not the same due to the timescale of the experiments affecting the long-term preservation of the items. In Exp.2, therefore, the fruit and vegetables were selected to be as similar to those use in Exp.1 as possible. A 7-point categorical judgement method was employed to quantify the subjective preference of the illuminated objects. Specifically, the seven scores (−3, −2, −1, 0, 1, 2, 3) corresponded to strongly dislike, moderately dislike, slightly dislike, neutral, slightly like, moderately like, and strongly like, respectively. This method was also adopted for the preference evaluation of the skin tone of the back of the hand of each subject, as shown in Figure 5. In the third task, subjects were asked to assess the atmosphere of the lighting environment according to four perceptual attributes: Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed. Figure 6 shows the questionnaire used to record these measurements, in which a score of +3 corresponded to the highest degree of comfort, warmth, activeness, and relaxation, respectively.  In the third task, subjects were asked to assess the atmosphere of the lighting environment according to four perceptual attributes: Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed. Figure 6 shows the questionnaire used to record these measurements, in which a score of +3 corresponded to the highest degree of comfort, warmth, activeness, and relaxation, respectively.

Experimental Procedure
The procedure for Exp.1 and Exp.2 is shown in Figure 1 above. Before the formal experiments, all the luminaires were switched on for at least 30 min to ensure they had stabilized. Upon arrival, the subjects randomly chose a seat and signed an informed consent form. A general information questionnaire was issued to collect demographic and lighting knowledge information. During that time, the experimenter read the instructions and answered any questions raised by the subjects.
The subjects then consecutively performed the six tasks in Session 1. The first task was to observe a plate of multicoloured fruit and vegetables (including an apple, an orange, a lemon, an onion, a green date, a cucumber, and some cherry tomatoes; Figure 5), and make a subjective preference rating. Note that the fruit and vegetables adopted in Exp.1 and Exp.2 were not the same due to the timescale of the experiments affecting the long-term preservation of the items. In Exp.2, therefore, the fruit and vegetables were selected to be as similar to those use in Exp.1 as possible. A 7-point categorical judgement method was employed to quantify the subjective preference of the illuminated objects. Specifically, the seven scores (−3, −2, −1, 0, 1, 2, 3) corresponded to strongly dislike, moderately dislike, slightly dislike, neutral, slightly like, moderately like, and strongly like, respectively. This method was also adopted for the preference evaluation of the skin tone of the back of the hand of each subject, as shown in Figure 5. In the third task, subjects were asked to assess the atmosphere of the lighting environment according to four perceptual attributes: Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed. Figure 6 shows the questionnaire used to record these measurements, in which a score of +3 corresponded to the highest degree of comfort, warmth, activeness, and relaxation, respectively.  The subjective evaluation of the lighting was followed by the Anfimov test, which was designed to measure attention and mental workability [49][50][51][52]. As can be seen in Figure 7, the test comprises 1200 uppercase letters printed as a justified paragraph (without hyphenation) on a 210 × 297 mm (A4) sheet of paper. The paragraph consists of 8 letters (A, B, C, E, H, K, N, X) such that each letter randomly appears 150 times (organized into 30 lines with 40 letters in each line). The text was presented in 14 pt Times New Roman font and contained 20 pairs of the two given letters EK (in sequence) whose positions were randomized and counterbalanced. Note that all the Anfimov tests conducted in this study had different text paragraphs with different arrangements of letters. During the test, the subjects were instructed to read the letters one by one (beginning at the upper left) and to underline the letter K in the preset pair EK, all within 2 min. In addition, they needed to mark the last letter they read when the time was up. Each subject had to complete three Anfimov tests in Session 1, each using a different coloured piece of paper-white, yellowish-white, and green, respectively. The order of the three Anfimov tests was randomized and a break of 1 min was allowed between each test.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 17 The subjective evaluation of the lighting was followed by the Anfimov test, which was designed to measure attention and mental workability [49][50][51][52]. As can be seen in Figure 7, the test comprises 1200 uppercase letters printed as a justified paragraph (without hyphenation) on a 210 × 297 mm (A4) sheet of paper. The paragraph consists of 8 letters (A, B, C, E, H, K, N, X) such that each letter randomly appears 150 times (organized into 30 lines with 40 letters in each line). The text was presented in 14 pt Times New Roman font and contained 20 pairs of the two given letters EK (in sequence) whose positions were randomized and counterbalanced. Note that all the Anfimov tests conducted in this study had different text paragraphs with different arrangements of letters. During the test, the subjects were instructed to read the letters one by one (beginning at the upper left) and to underline the letter K in the preset pair EK, all within 2 min. In addition, they needed to mark the last letter they read when the time was up. Each subject had to complete three Anfimov tests in Session 1, each using a different coloured piece of paper-white, yellowish-white, and green, respectively. The order of the three Anfimov tests was randomized and a break of 1 min was allowed between each test. After the Anfimov test, the reading comfort assessment was performed, during which the subjects had to read the questionnaires used in the Anfimov test ( Figure 7) and conduct the assessment of reading comfort. These questionnaires contained sentences in Chinese and English (including the uppercases letters in the Anfimov test), as well as Arabic numerals. The 7-point judgement method was again adopted with a score of +3 relating to the most comfortable reading experience.
At the end of Session 1, the subjective level of sleepiness was measured by the KSS test, a classical approach which corresponds well with the electroencephalography (EEG) test [53]. The KSS test contains nine points, varying from extremely alert (number 1) to very sleepy (number 9). This ninepoint scale indicates which level best reflects the subject's psychophysical state experienced in the previous 10 min [54]. Table 2 gives an overview of the corresponding scales, the original descriptions, and the translated Chinese descriptions, as used in this study. Approximately 30 min were required to complete Session 1.

Scales
Descriptions in English Descriptions in Chinese 1 Extremely alert 极度清醒 2 Very alert 非常清醒 3 Alert 清醒 4 Rather alert 一般清醒 After the Anfimov test, the reading comfort assessment was performed, during which the subjects had to read the questionnaires used in the Anfimov test ( Figure 7) and conduct the assessment of reading comfort. These questionnaires contained sentences in Chinese and English (including the uppercases letters in the Anfimov test), as well as Arabic numerals. The 7-point judgement method was again adopted with a score of +3 relating to the most comfortable reading experience.
At the end of Session 1, the subjective level of sleepiness was measured by the KSS test, a classical approach which corresponds well with the electroencephalography (EEG) test [53]. The KSS test contains nine points, varying from extremely alert (number 1) to very sleepy (number 9). This nine-point scale indicates which level best reflects the subject's psychophysical state experienced in the previous 10 min [54]. Table 2 gives an overview of the corresponding scales, the original descriptions, and the translated Chinese descriptions, as used in this study. Approximately 30 min were required to complete Session 1.
As described in Section 2.1, a 2-h period of self-study (Session 2) was performed after completion of Session 1, during which all the subjects focused on reviewing textbooks and doing exercises for their college final examinations. In the final session, Session 3, the six tests used in Session 1 were repeated to investigate the impact of the visual fatigue caused in Session 2.

Results of Colour Preference and Lighting Environment Perception
The overall results of the colour preference (fruit and vegetable; skin tone) and atmosphere perception (Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed) experiments are shown in Figure 8. The rating trends for the four scenarios (i.e., two sessions in two experiments) are consistent and the scores for the scenarios in Exp.2 are significantly higher than the scores in Exp.1. According to a Mann-Whitney U test, there is no significant difference between the results for the experienced (Group A) and the inexperienced subjects (Group B) in Exp.2, in terms of the six aspects (p values > 0.05, with a range from 0.12 to 0.89), which implies that any practice effect for these visual attributes is very small. Thus, in this section, the results of Group A and Group B in Exp.2 were combined for analysis.

Scales
Descriptions Very sleepy, great effort to keep awake, fighting sleep 有困意,且需要极大的努力保持清醒 As described in Section 2.1, a 2-h period of self-study (Session 2) was performed after completion of Session 1, during which all the subjects focused on reviewing textbooks and doing exercises for their college final examinations. In the final session, Session 3, the six tests used in Session 1 were repeated to investigate the impact of the visual fatigue caused in Session 2.

Results of Colour Preference and Lighting Environment Perception
The overall results of the colour preference (fruit and vegetable; skin tone) and atmosphere perception (Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed) experiments are shown in Figure 8. The rating trends for the four scenarios (i.e., two sessions in two experiments) are consistent and the scores for the scenarios in Exp.2 are significantly higher than the scores in Exp.1. According to a Mann-Whitney U test, there is no significant difference between the results for the experienced (Group A) and the inexperienced subjects (Group B) in Exp.2, in terms of the six aspects (p values > 0.05, with a range from 0.12 to 0.89), which implies that any practice effect for these visual attributes is very small. Thus, in this section, the results of Group A and Group B in Exp.2 were combined for analysis. The statistical significance (α = 0.05) of the observed effects of the lighting environment, test trial (i.e., first trial in Session 1 and the second trial in Session 3), gender, as well as the interaction effects, was analysed by multivariate analysis of variance (MANOVA) tests using SPSS software. The results are listed in Table 3 and show that the lighting environment has a significant effect (α = 0.05) on all aspects except the preference rating of the fruit and vegetables, whereas no significant effects of test The statistical significance (α = 0.05) of the observed effects of the lighting environment, test trial (i.e., first trial in Session 1 and the second trial in Session 3), gender, as well as the interaction effects, was analysed by multivariate analysis of variance (MANOVA) tests using SPSS software. The results are listed in Table 3 and show that the lighting environment has a significant effect (α = 0.05) on all aspects except the preference rating of the fruit and vegetables, whereas no significant effects of test trial and gender were found. Specifically, the insignificant impact of test trial in Table 3 indicates that the visual fatigue caused by the two-hour self-study period in Session 2 did not remarkably influence the visual judgments. It is noteworthy that the p values for gender in terms of fruit and vegetables preference, skin preference, Uncomfortable/Comfortable, and Negative/Active are close to the threshold of significant level (α = 0.05), which suggests the need for further investigation on gender difference in future work. None of the other two-way or three-way interactions among the three fixed factors, lighting environment, test trial, and gender, were significant (p values were between 0.14 and 0.97 with a mean of 0.73). After the Bonferroni correction (α = 0.05/6 = 0.0083) to account for the multiple significance tests, lighting environment remained significant for skin preference, Uncomfortable/Comfortable, and Cool/Warm. The partial eta-squared values (η 2 ) revealed the effect size of each factor (Table 3). For most of the visual attributes, lighting environment had a major effect on the subjects' evaluations, since the η 2 values were larger than those for the test trial and gender (with an exception for preference of fruit and vegetables).
As shown in Figure 8, the preference ratings of fruit and vegetables are different to those of skin tone. Such a result is consistent with the findings of Tang et al. [55], which aimed to evaluate the differences in appreciation for fruit and vegetables, packaging materials, and skin tone of the hand. These results might be attributed to the difference of colour saturation enhancement between multicoloured objects (fruit and vegetables) and single colour object (skin tone). In addition, the fact that each subject rated their own skin tone but the same group of fruit and vegetables, within one experiment, may be significant.
As stated earlier, the lighting environment of Exp.2 is perceived to be more comfortable, active, relaxed, and warmer. Note that in Exp.2, the light sources were of higher illuminance (Exp.1:~150 lux; Exp.2:~300 lux) and lower CCT (Exp.1:~6600 K; Exp.2:~5100 K). In related work by Hsieh et al. [56], similar results were obtained where visual perceptions for indoor lighting with a CCT of 5000 K and relatively high illuminance levels were considered to be more comfortable, energetic, relaxed, and warmer than those with a CCT of 6500 K and relatively low illuminance levels. Obviously, current knowledge could not cover the large variations of practical lighting conditions. Thus, we believe further research focusing on the interactive effect of illuminance and CCT should be encouraged, which would enhance the understanding of human visual perceptions regarding different lighting cases.

Results of Anfimov Test
In earlier studies, the Anfimov test has been commonly used to measure cognitive performance under different classroom lighting conditions [57][58][59]. To our knowledge, however, the impact of a practice effect or the degree of proficiency on the test results has not been investigated. Therefore, to investigate this issue, different groups of subjects with and without test experience in Exp.1 were invited to participate in Exp.2 (i.e., Group A and Group B). Note that results of the repeated tests within a defined experiment (i.e., data from Session 1 and Session 3) could not be used to validate the practice effect, since such repetition was confounded by visual fatigue caused by the two-hour self-study period in Session 2.
The number of letters read per minute was used as a measure of proofreading speed and the proofreading accuracy (PA) was calculated as follows [52]: where γ and σ represent the number of times the letter K was correctly underlined by the subjects in the given pair of letters, EK, and the total number of times the letter K should have been underlined in the assessed text (i.e., the text that had been read by the subject), respectively. Figure 9 illustrates the comparison of the Anfimov test results derived from experienced subjects (Group A) and inexperienced subjects (Group B) in Exp.2. In each experiment, two trials (i.e., tests in Session 1 and Session 3) were conducted. As can be seen, the subjects in Group A, with higher proficiency in the Anfimov test, always performed better in proofreading speed and accuracy than the subjects of Group B. This indicates that the Anfimov test is heavily influenced by the practice effect and this must be taken into consideration when implementing the test in further studies. In addition, the difference between test results of Session 1 and Session 3 for the two experiments was not so clear. A possible explanation is that although subjects might perform better in the second round of tests (i.e., Session 3), the visual fatigue caused by the two-hour study in Session 2 offset that improvement.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 17 encouraged, which would enhance the understanding of human visual perceptions regarding different lighting cases.

Results of Anfimov Test
In earlier studies, the Anfimov test has been commonly used to measure cognitive performance under different classroom lighting conditions [57][58][59]. To our knowledge, however, the impact of a practice effect or the degree of proficiency on the test results has not been investigated. Therefore, to investigate this issue, different groups of subjects with and without test experience in Exp.1 were invited to participate in Exp.2 (i.e., Group A and Group B). Note that results of the repeated tests within a defined experiment (i.e., data from Session 1 and Session 3) could not be used to validate the practice effect, since such repetition was confounded by visual fatigue caused by the two-hour selfstudy period in Session 2.
The number of letters read per minute was used as a measure of proofreading speed and the proofreading accuracy (PA) was calculated as follows [52]: where γ and σ represent the number of times the letter K was correctly underlined by the subjects in the given pair of letters, EK, and the total number of times the letter K should have been underlined in the assessed text (i.e., the text that had been read by the subject), respectively. Figure 9 illustrates the comparison of the Anfimov test results derived from experienced subjects (Group A) and inexperienced subjects (Group B) in Exp.2. In each experiment, two trials (i.e., tests in Session 1 and Session 3) were conducted. As can be seen, the subjects in Group A, with higher proficiency in the Anfimov test, always performed better in proofreading speed and accuracy than the subjects of Group B. This indicates that the Anfimov test is heavily influenced by the practice effect and this must be taken into consideration when implementing the test in further studies. In addition, the difference between test results of Session 1 and Session 3 for the two experiments was not so clear. A possible explanation is that although subjects might perform better in the second round of tests (i.e., Session 3), the visual fatigue caused by the two-hour study in Session 2 offset that improvement. The mean values of the subjects' proofreading speed and accuracy from Exp.1 and Exp.2 are shown in Figure 10. To remove the influence of the practice effect (proficiency), only the results of Group B in Exp.2 and the data of Exp.1 are shown. There is little difference between the results of Exp.1 and Exp.2 in most of the six comparisons (3 paper colours × 2 sessions). The difference between Exp.1 and Exp.2 in terms of the proofreading accuracy using black text on yellowish-white paper (Figure 10, right) is relatively larger. However, as verified by the Mann-Whitney U test, there is actually no significant difference between Exp.1 (~6600 K-150 lux) and Exp.2 (~5100 K-300 lux) for all the six comparisons (p values > 0.05). That is, there is no significant effect of lighting environment The mean values of the subjects' proofreading speed and accuracy from Exp.1 and Exp.2 are shown in Figure 10. To remove the influence of the practice effect (proficiency), only the results of Group B in Exp.2 and the data of Exp.1 are shown. There is little difference between the results of Exp.1 and Exp.2 in most of the six comparisons (3 paper colours × 2 sessions). The difference between Exp.1 and Exp.2 in terms of the proofreading accuracy using black text on yellowish-white paper (Figure 10, right) is relatively larger. However, as verified by the Mann-Whitney U test, there is actually no significant difference between Exp.1 (~6600 K-150 lux) and Exp.2 (~5100 K-300 lux) for all the six comparisons (p values > 0.05). That is, there is no significant effect of lighting environment on the Anfimov test results. Weng et al. [57] found similar results, i.e., there was no significant difference between the results derived from the lighting conditions defined by 5000 K-300 lux and 6500 K-150 lux. In addition, the Mann-Whitney U test indicates that the test trial, paper colour, and gender have no statistically significant influence on the results of the Anfimov test (p values > 0.05).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 17 on the Anfimov test results. Weng et al. [57] found similar results, i.e., there was no significant difference between the results derived from the lighting conditions defined by 5000 K-300 lux and 6500 K-150 lux. In addition, the Mann-Whitney U test indicates that the test trial, paper colour, and gender have no statistically significant influence on the results of the Anfimov test (p values > 0.05). Figure 10. The means of the subjects' proofreading speed and accuracy in Exp.1 and in Exp.2 after removing the practice effect. The error bars depict the 95% confidence intervals.

Results of Reading Comfort Assessment
According to the results of the Mann-Whitney U test, no significant difference was found between the reading comfort ratings of experienced subjects (Group A) and inexperienced subjects (Group B) in Exp.2. Thus, no practice effect was found and the data in Exp.2 were combined.
The effects of paper colour, lighting environment, test trial, as well as gender, on reading comfort were investigated using a MANOVA test ( Table 4). The hypothesis was that paper colour, lighting environment, and their interaction effect might have a significant impact on the rating of reading comfort. As shown in Table 4, however, only paper colour exhibited a significant influence (p < 0.001, with the largest η 2 value) and the impact of lighting was insignificant. The results of the MANOVA test also show that no other effects of gender and test trial (including interaction effects) were significant. Thus, it can be concluded that visual fatigue does not have a significant impact on reading comfort assessments. It should also be noted that the p value for gender was only slightly higher than 0.05 (0.052), which is consist with the results in Section 3.1 above. This finding further suggests that the effect of gender should not be underestimated and deserves further investigation.
The post hoc test for the significant main effects (paper colour) revealed that the comfort ratings amongst different paper colours were significantly different at the 95% confidence level: p < 0.001 for white paper vs. yellowish-white paper; p = 0.005 for white paper vs. green paper; p < 0.001 for yellowish-white paper vs. green paper.

Results of Reading Comfort Assessment
According to the results of the Mann-Whitney U test, no significant difference was found between the reading comfort ratings of experienced subjects (Group A) and inexperienced subjects (Group B) in Exp.2. Thus, no practice effect was found and the data in Exp.2 were combined.
The effects of paper colour, lighting environment, test trial, as well as gender, on reading comfort were investigated using a MANOVA test ( Table 4). The hypothesis was that paper colour, lighting environment, and their interaction effect might have a significant impact on the rating of reading comfort. As shown in Table 4, however, only paper colour exhibited a significant influence (p < 0.001, with the largest η 2 value) and the impact of lighting was insignificant. The results of the MANOVA test also show that no other effects of gender and test trial (including interaction effects) were significant. Thus, it can be concluded that visual fatigue does not have a significant impact on reading comfort assessments. It should also be noted that the p value for gender was only slightly higher than 0.05 (0.052), which is consist with the results in Section 3.1 above. This finding further suggests that the effect of gender should not be underestimated and deserves further investigation.
The post hoc test for the significant main effects (paper colour) revealed that the comfort ratings amongst different paper colours were significantly different at the 95% confidence level: p < 0.001 for white paper vs. yellowish-white paper; p = 0.005 for white paper vs. green paper; p < 0.001 for yellowish-white paper vs. green paper. Figure 11 illustrates the average scores of the comfort rating and the results of a Wilcoxon Signed Rank Test, which was used to further verify the impact of paper colour on perceived reading comfort in Exp.1 and Exp.2. The results showed that the yellowish-white paper was related to the most comfortable reading experience (p values < 0.05), while the white paper was rated as least comfortable.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 17 Figure 11 illustrates the average scores of the comfort rating and the results of a Wilcoxon Signed Rank Test, which was used to further verify the impact of paper colour on perceived reading comfort in Exp.1 and Exp.2. The results showed that the yellowish-white paper was related to the most comfortable reading experience (p values < 0.05), while the white paper was rated as least comfortable. Figure 11. Bar chart of the average comfort rating scores with results of Wilcoxon Signed Rank Test (only the p values lower than 0.05 were marked).
Another finding of interest is that the average comfort rating score for green paper increases significantly after the 2-h self-study session (p = 0.007) in Exp.1, which was not found in Exp.2 (in Exp.2, the increment does exist but is not significant). This result could be related to the interaction between the lighting environment and the visual perception of green paper, which should be further clarified by future work. Figure 12 gives an overview of the subjects' alertness monitored by means of the KSS test. Similarly, there was no statistical differences between the KSS ratings of experienced subjects (Group A, Exp.2) and inexperienced subjects (Group B, Exp.2), as tested by a Mann-Whitney U test (Session 1: p = 0.26; Session 3: p = 0.89). Thus, for the KSS test, all the data in Exp.2 were combined. As shown in Figure 12, the average alertness was always located between "some signs of sleepiness" (number 6) and "rather alert" (number 4). Surprisingly, the average KSS ratings of Session 1 and Session 3 were almost identical, which indicates that the visual fatigue caused by the two-hour Another finding of interest is that the average comfort rating score for green paper increases significantly after the 2-h self-study session (p = 0.007) in Exp.1, which was not found in Exp.2 (in Exp.2, the increment does exist but is not significant). This result could be related to the interaction between the lighting environment and the visual perception of green paper, which should be further clarified by future work. Figure 12 gives an overview of the subjects' alertness monitored by means of the KSS test. Similarly, there was no statistical differences between the KSS ratings of experienced subjects (Group A, Exp.2) and inexperienced subjects (Group B, Exp.2), as tested by a Mann-Whitney U test (Session 1: p = 0.26; Session 3: p = 0.89). Thus, for the KSS test, all the data in Exp.2 were combined. comfortable reading experience (p values < 0.05), while the white paper was rated as least comfortable. Another finding of interest is that the average comfort rating score for green paper increases significantly after the 2-h self-study session (p = 0.007) in Exp.1, which was not found in Exp.2 (in Exp.2, the increment does exist but is not significant). This result could be related to the interaction between the lighting environment and the visual perception of green paper, which should be further clarified by future work. Figure 12 gives an overview of the subjects' alertness monitored by means of the KSS test. Similarly, there was no statistical differences between the KSS ratings of experienced subjects (Group A, Exp.2) and inexperienced subjects (Group B, Exp.2), as tested by a Mann-Whitney U test (Session 1: p = 0.26; Session 3: p = 0.89). Thus, for the KSS test, all the data in Exp.2 were combined. As shown in Figure 12, the average alertness was always located between "some signs of sleepiness" (number 6) and "rather alert" (number 4). Surprisingly, the average KSS ratings of Session 1 and Session 3 were almost identical, which indicates that the visual fatigue caused by the two-hour As shown in Figure 12, the average alertness was always located between "some signs of sleepiness" (number 6) and "rather alert" (number 4). Surprisingly, the average KSS ratings of Session 1 and Session 3 were almost identical, which indicates that the visual fatigue caused by the two-hour self-study period in Session 2 did not remarkably impact the psychophysical state (sleepiness) of the subjects. This was unexpected and might be due to the intense concentration shown by the subjects during Session 2. As noted earlier, the subjects were busy preparing for their final exams when this experiment was carried out. Therefore, during Session 2, all the subjects worked very hard and thus, kept alert. In further work, it would be of interest to conduct a comparative experiment where the students are not under any pressure (e.g., when they have a relaxing time reading). Despite this, we believe this study has simulated a common and meaningful case for classroom lighting.

Results of KSS Test
As shown in Table 5, there was no significant impact of lighting environment and gender (including interaction effects) on the subjects' alertness (p > 0.05). Similar results were also obtained by Linhart and Scartezzini [60] who compared two lighting environments (average illuminance 232 vs. 352 lux, CCT not reported) and found that there was no measurable influence of lighting on the participants' alertness as quantified by the KSS test. Viola et al. declared, however, that self-reported alertness in the KSS test was improved under blue-enriched white light with a higher CCT [61]. There are two possible explanations for such inconsistency. First, the difference in CCT values between Exp.1 and Exp.2 in our study was not large enough to affect alertness. Specifically, the average CCT values in Exp.1 and Exp.2 were 6600 and 5100 K (∆CCT = 1500 K), respectively, while in Viola et al. [61], the test CCTs were 4000 and 17,000 K (∆CCT = 13,000 K). Second, the interaction effect of CCT and illuminance may also lead to the insignificant differences in KSS results. As shown in Table 1, the CCT values of Exp.1 were higher but the illuminance level of Exp.2 was higher (approximately 300 vs. 150 lux). In summary, a significant impact of classroom lighting on visual perception (i.e., skin preference, multi-dimensional lighting atmosphere evaluation) was found, while no measured effects of lighting were observed on the participant's reading comfort, attention, and alertness. The impact of visual fatigue generated by a two-hour, high-intensity, self-study period was insignificant. The influence of proficiency in the Anfimov test was shown, suggesting that a full consideration of a practice effect should be considered when using this or similar tests (e.g., Landolt rings [60] or the D2 test [62]). To solve this, use of a between-subject design in the experimental protocol would be useful. In addition, there was imperfect experimental control due to the inherent limitation of any field study [42]. Thus, as suggested by recent researchers [42,63], multiple statistical tools were used to demonstrate convincing conclusions.

Conclusions
This paper describes a field study in which the impact of classroom lighting on students' visual perception and cognitive performance was investigated. Seventy-nine college students were tested on aspects of colour preference, atmosphere perception, reading comfort, self-reported alertness, and attention. A significant influence of the lighting environment on skin preference and atmosphere perception (i.e., Uncomfortable/Comfortable, Cool/Warm, Negative/Active, and Tense/Relaxed) was found, while no measured effects of indoor lighting were observed on reading comfort, alertness, and attention. In addition, the influence of visual fatigue generated by a two-hour self-study period was found to be insignificant. Paper colour was demonstrated to be important for reading comfort and the practice effect on the proofreading speed and accuracy of the Anfimov test was also demonstrated. These findings should help to further understand the impact of classroom lighting on the visual perception and cognitive performance of college students.

Conflicts of Interest:
The authors declare no conflict of interest.