1. Introduction
Attention, being central to cognition [
1], refers to selectively processing information relevant to the current task while ignoring other irrelevant information [
2]. Depending on the direction of information flow, attention can be divided into two types: voluntary attention and reflexive attention [
3,
4]. The former under top-down control can direct attention to the position related to the goal [
5,
6], while the latter under bottom-up control is often referred to as exogenous attention because it is thought to be triggered by external stimuli [
3,
6]. Attention engages in different modalities including visual and auditory, and in various stages of brain functions from processing and perception of information to finally behavioral response [
7]. Previous studies have found that voluntary attention and reflexive attention struggle to control the focus of attention with interaction in a push–pull fashion [
2] in higher vertebrates including birds [
8], mammals [
9], non-human primates [
10] and humans [
11]. For example, behavioral evidence has shown that a sudden alarm call will cause Japanese great tits (
Parus minor) to stop searching for food and quickly glance at their surroundings [
12].
For many animal species (including humans), multiple cortical and subcortical structures engage in auditory attention, and neural mechanisms of attention select the information that can gain access to the brain networks making cognitive decisions [
1]. Both forebrain and midbrain networks contain specialized neural circuits that process the highest priority information at each stage for decision-making. The former selects information, based on task demands, from all available sources, including sensory input, memory stores, and plans for action, and then assign attention either to stimulus features, sensory modalities, objects, locations, or memory stores [
1]. Conversely, the latter is concerned only with the relative priorities of locations, based on the stimulus’ physical salience and its behavioral relevance, and assigns spatial attention to the highest priority location [
1]. Moreover, overlapping brain networks, such as the fronto-parietal network, can be activated by bottom-up triggered and top-down controlled auditory attention to pitch [
13,
14,
15], suggesting the information involved in the two types of attention would be integrated in the brain. However, it is not yet clear whether voluntary attention and reflexive attention exist in the lower vertebrates.
For most anuran species, vocal communication is the most important medium for reproductive success and social interactions [
16,
17]. Generally, various species always gather in choruses to attract conspecific mates. Therefore, the acoustic environment of a chorus would be very complex due to high levels of background noise, vagaries of spatial distribution of males, intense competition between males, and temporal overlap among advertisement calls of rivals. The noisy social environments created by a chorus may affect communication efficiency [
18]. For example, overlapping calls might obscure fine acoustic attributes of calls and further influence signal selectivity and decision-making in females [
19,
20]. Accordingly, how to detect conspecific vocalizations in the choruses and respond to them correctly is a main challenge for receivers [
17,
21,
22,
23]. The solution adopted by males of some anuran species is to use selective attention to adjust note or call timing according to only the loudest (or nearest) one or two neighbors in the chorus while ignoring the notes or calls of other individuals [
24,
25]. Both call alternation and call synchrony may result from a neural process that may reset a male’s call-timing after perception of a rival’s call [
26,
27]. By this “inhibitory-resetting” mechanism of call-timing, a male can increase the likelihood of occupying a leading position relative to those of his neighbors, with which the male can compete effectively and attract the attention of females because of the precedence effect, an inherent property of the auditory system in vertebrates [
28,
29]. In fact, computer modeling has demonstrated that both the “inhibitory-resetting” mechanism and selective attention may be favored by selection when female mate choice is biased by the precedence effect [
27]. In addition, a sudden sound such as loud voices from a person would cause the calling frogs to stop calling. Consequently, it is, therefore, reasonable to assume that both voluntary attention and reflexive attention might engage in auditory perception in anurans; however, there is still much that remains unknown about whether these two attention patterns could be reflected by brain activities in anurans.
Event-related potential (ERP) is the measured brain response to a specific event, whose amplitudes and latencies can be used to examine processing efficiency and time course of information processing in the brain [
30]. In humans, the main components of auditory ERP include N1, P2 and P3, with their peaks at about 100, 200 and 300 ms after the stimulus onset, respectively [
31,
32,
33], during which N1 relates to attention of the subject and is sensitive to physical features of the stimulus [
33,
34]. In addition, the stimulus preceding negativity (SPN) isolated from contingent negative variation is thought of as a cognitive component because its amplitude would gradually increase with the approach of the stimulus. Therefore, the SPN could be used as a measure of expectations with voluntary attention and a tool of the research on the impact of uncertainty [
35,
36,
37]. Interestingly, human-like auditory ERP components, with different latencies across various species for each component, have been determined in monkeys [
38], cats [
39], dolphins [
40], rabbits [
41], rats [
42] and frogs [
43,
44,
45,
46,
47]. Because important neuroanatomical features, including a set of brain structures that attention depends on, have been conserved during vertebrate brain evolution [
48,
49,
50], similar ERP components across different species may indicate similar brain functions to some extent.
The Emei music frog (
Nidirana daunchina) is a typical seasonal reproductive species. Males of this species produce advertisement calls from inside and outside underground burrows in the breeding season [
51]. Calls produced from inside burrows are of high sexual attractiveness for conspecific females because of the call acoustics modified by resonant properties of the burrows, while the calls produced from open fields are weakly sexually attractive. Previous behavioral and electrophysiological studies on this species have revealed that auditory perception in this species might recruit selective attention [
43,
51,
52]. For example, females prefer inside calls to outside ones in phonotaxis tests, while males are more likely to compete vocally against inside calls compared to outside ones, congruent with the findings that voluntary attention may be involved in anurans’ auditory perception [
24,
25]. At the electrophysiological level, inside calls could evoke significantly greater N1 amplitudes compared with outside calls when the two type of calls are played back according to a random sequence [
43], suggesting reflexive attention may exist in music frogs. Based on these studies, we hypothesized that auditory perception in this species would depend on the combination of voluntary attention and reflexive attention. To verify this, we used a binary playback paradigm with silence period replacement and an equiprobably random playback paradigm to explore these two types of attention, respectively. During broadcasting of acoustic stimuli related to breeding or survival, electroencephalogram (EEG) signals were synchronously collected from both sides of the telencephalon, diencephalon and mesencephalon, and the amplitude and latency were acquired for each ERP component. We predicted that: (1) if voluntary attention exists, the SPN amplitudes elicited in the telencephalon during the experiments recruiting voluntary attention only would be greater than those in the mesencephalon, because of top-down control in voluntary attention; (2) on the contrary, if reflexive attention exists, the N1 amplitudes evoked in the mesencephalon during the experiments recruiting reflexive attention only would be greater than those in the telencephalon because of bottom-up control in reflexive attention; and (3) if auditory perception depends on both voluntary attention and reflexive attention, the predictable stimuli would evoke both SPN and N1.
2. Materials and Methods
2.1. Animals
Sixteen frogs (8 females and 8 males) were captured during their reproductive season in the Emei mountain area of Sichuan, China. The animals were separated by sex in two opaque plastic tanks (54 × 40 cm and 33 cm deep), which were placed in a room under a 12:12 light–dark cycle (light on at 08:00 am). The temperature and humidity in the room were controlled at 23 ± 1 °C and 79.3 ± 8.5%, respectively. At the time of surgery, the mean mass of the animals was 9.96 ± 1.64 g, while the mean length was 4.65 ± 0.24 cm. The animals were fed with fresh crickets every three days.
2.2. Surgery
All electrophysiological experiments were carried out during the reproductive season of this species. Surgical procedures were described in detail in previous studies [
53,
54]. Briefly, the animals were anesthetized with 0.15% solution of tricaine methanesulfonate (MS-222) before surgery, and seven stainless steel screws (0.8 mm in diameter) were implanted into each animal’s skull with about 0.8 mm depth of the tips resting on the dura mater (
Figure 1). Six of them were located on both sides of the telencephalon, diencephalon and mesencephalon, respectively, and the reference electrode was located above the cerebellum because of its several-fold lower activities compared with the cerebral ones [
55]. Each electrode lead was formvar-insulated nichrome wire. One end of the wire was tightly enwound around the screw, while the other was tin soldered to a female pin of an electrical connector (the male pin was connected to the cable of the signal acquisition system). The electrodes were fixed to each animal’s skull with dental acrylic. The connector covered with self-sealing membrane (Parafilm® M; Chicago, IL, USA) was located approximately 1 cm above the animal’s head [
56]. The experiments were carried out after 7 days of recovery after surgery. After finishing the experiments, the frogs were euthanized with an overdose of anesthetic, and then hematoxylin dye was injected into the skull holes where the electrodes were implanted previously to determine whether the electrodes were implanted at the correct locations, in order to verify that the EEG recordings were acquired from appropriate brain regions.
2.3. Recording Conditions
An opaque experimental tank (80 × 60 cm and 55 cm deep) containing mud and water was placed in an electromagnetically shielded and soundproof chamber (background noise was 23.0 ± 1.7 dB). An infrared camera with motion detection was mounted approximately 1 m above the tank for monitoring movement status of the animals. A signal acquisition system (Chengyi, RM6280C; Chengdu, China) was used to record the subjects’ electrophysiological signals. The band-pass filter was set at 0.16–100 Hz, while the sampling frequency was set at 1000 Hz.
2.4. Stimulus and Procedure
Five stimuli were used in the present study: white noise, pure tone of 1000 Hz, conspecific male advertisement call, screech call and silence. Because the results of statistical analyses in animal behavior, neuroscience, and ecological studies might be affected by pseudoreplication [
57,
58,
59,
60], we used multiple stimulus exemplars to control these possible effects. Specifically, we randomly selected four conspecific advertisement calls that contained five notes and were recorded from four different individuals inside their burrows. The temporal and spectral parameters of the selected advertisement calls were close to the averages for the population. Since we encountered only one individual that was attacked by a snake in a field, only one screech call containing five “notes” was used in the present study. Both white noise and pure tone were constructed as a consecutive “call” with their duration equal to the average duration of the four advertisement calls (about 1.28 s) and shaped with a rise and fall time sinusoidal period of 7.5 ms (
Figure 2).
White noise and one of the other types of acoustic stimuli but not silence was paired, i.e., white noise vs. target sound. For voluntary attention, each of the three types of stimulus pairs was selected randomly and presented antiphonally with 1.5 s inter-stimulus intervals (ISI). After 20 presentations to familiarize the subjects with the patterns of stimulus sequences, the target sound at the last position of every N presentation of this sound was replaced by silence until 120 replacements were achieved (N = 3–6, each number was selected 30 times but selected randomly for each replacement). Thus, for each stimulus pair and each frog, a total of 1320 sound presentations with white noise were presented 660 times, the target sounds were presented 540 times, and the replacements happened 120 times. The session lasted about 66 min and included 4 blocks with 5 min breaks between blocks in order that the animals could have a rest. To test reflexive attention, each of the three types of stimulus pairs was selected and presented randomly using an equiprobability paradigm, in which both white noise and the target sound were presented in 50% probability. The ISI was set randomly at one of 1.1, 1.3, 1.5, 1.7 and 1.9 s for each presentation. Therefore, the subjects could not predict what the next presentation was and when the next presentation would appear; thus, voluntary attention for a given stimulus would be eliminated to an extreme. For each stimulus pair and each animal, a total of 200 stimulus presentations with each stimulus played back 100 times were presented in a random order. The session lasted about 9 min. For the stimulus pairs including advertisement calls, each stimulus pair was broadcasted to four animals (two females and two males), and all animals had never heard the acoustic stimuli before.
Acoustic stimuli were played back using two portable speakers (SME-AFS; Saul Mineroff Electronics, Elmont, New York, NY, USA) that were placed equidistantly at the opposite ends of the experimental tank. The sound pressure was adjusted to 65 ± 0.5 dB SPL for each acoustic stimulus using a sound pressure meter (Aihua, AWA6291; Hangzhou, China; re 20 µPa, fast response, C-weighting), measured at the center of the experimental tank, approximately equal to the average of natural sound pressure level of male calls. Thus, the sound level distribution at the experimental bank bottom was close to a quasi-free sound field. Furthermore, the animals always remained motionless at one corner of the experimental tank throughout the experiments. Accordingly, it was highly unlikely that the ERP measures would be affected significantly by the tiny differences in the stimulus amplitude across the tank bottom. All experimental procedures were realized with a custom-made software written in C++, which could automatically save the order of the random stimulus stream. A trigger pulse was sent to the signal acquisition system at every stimulus onset via the parallel port of a PC for further time-locking analysis.
2.5. Data Acquisition and Processing
After recovery for 7 days, each animal was placed in the tank and connected to the signal acquisition system for habituation about 1 day before the following experiments. Then, the EEG signals and behavioral data were recorded according to the above described experimental paradigms. In order to extract ERP components, the EEG raw data were filtered using a band-pass filter of 0.25–25 Hz and a notch filter of 50 Hz. For the experiments testing voluntary attention, EEG signals were divided into epochs with duration of 700 ms, including a pre-stimulus baseline of 200 ms, for the target sound. To analyze the SPN component, EEG signals were divided into epochs with duration of 2980 ms, from 200 ms pre-presentation of white noise to the presentation of silence. In order to test whether auditory perception depended on both voluntary attention and reflexive attention, EEG signals were divided into epochs with duration of 3480 ms, from 200 ms pre-presentation of white noise to 500 ms after the onset of presentation of target sound. For the experiments testing reflexive attention, EEG signals were divided into epochs with duration of 700 ms, including a pre-stimulus baseline of 200 ms. All epochs were visually inspected, and those with artifacts in which the maximal amplitude exceeded ±60 μv were removed from further analysis. Accepted trials (roughly 55% for each stimulus pair and each brain region) were averaged according to stimulus type for each brain area within each session.
For each acoustic stimulus and each brain region, the peak of each ERP component could be found in the grand average waveforms that were acquired from averaged waveforms across all frogs (see
Figures S1–S4 in the Supplementary Materials). For all experiments, the latency of the N1 peak was measured from the grand average waveforms for each brain area and each stimulus; then, the median was calculated regardless of brain area and acoustic stimulus. Finally, the time window of the N1 component was defined as the latency range of 20–120 ms after the target stimuli onset with the median as the midpoint. The N1 amplitude was calculated as the mean amplitude in that time window using a custom-made software in Matlab. Similarly, for the experiments recruiting voluntary attention, the SPN amplitude was defined as the mean amplitude during intervals of 500 ms before the onset of silence replacement. For each ERP component, the latency was calculated as the half area latency with the same time window as the amplitude measurement, i.e., computing the area under the ERP waveform over a given latency range (i.e., time window) and then finding the time point that divides that area into equal halves using a custom-made software in Matlab [
30]. Because we focused on detecting the direction of information flow (top-down or bottom-up), the amplitudes or latencies of each ERP component were averaged over the left and right sides of the telencephalon, diencephalon and mesencephalon, respectively. Human-like auditory P2 and P3 components have been verified in the music frog [
43,
44]; however, these two components might link to other brain functions rather than attention. Accordingly, we did not consider these components in this study.
2.6. Statistical Analyses
Shapiro–Wilk
W test and Levene’s test were used to estimate the distribution normality and homogeneity of variance for amplitudes and latencies of each ERP component. For the stimulus pairs including conspecific calls, latencies and amplitudes of ERP components were analyzed statistically using a three-way repeated measures ANOVA with the variables of “stimulus pair” (the four stimulus pairs including different conspecific calls), “sex” (female/male), and “brain area” (the telencephalon, diencephalon and mesencephalon). There was no significant main effect of “stimulus pair”, congruent with the idea that the four stimulus pairs were not significantly different at evoking responses from the animals. Thus, amplitudes or latencies of each ERP component for the stimulus pairs including conspecific calls were pooled regardless of “stimulus pair”. A three-way repeated measure analysis of ANOVA was used for the amplitudes and latencies of N1 and SPN components with the variables of “sex” (female/male), “acoustic stimulus” (conspecific call, pure tone and screech call), and “brain area” (the three brain areas). Both main effects and interactions for the variables were examined. Multiple comparisons using the Bonferroni correction and simple effect analysis were performed when ANOVAs returned a significant difference and the interaction effects were significant, respectively [
61]. If the assumption of sphericity was violated, the Greenhouse–Geisser ε values were employed. The partial
η2 value was used to determine the effect size (partial
η2 = 0.20 was set as small, 0.50 as medium, and 0.80 as large effect size, respectively) [
62]. SPSS software (release 21.0) was utilized for the statistical analysis using
p < 0.05 as the significance level.