1. Introduction
The “brain-machine interface” (BMI) is unquestionably an innovative technology that is undergoing extensive and rapid progress [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. BMIs enable the neuroprosthetic control of external devices by brain activity instead of body movements. Non-invasive BMIs may be appropriate for clinical use, but the accurate control of external devices is currently limited because of imprecise and unstable signals from the brain surface. Therefore, research for invasive BMIs is inevitably required [
8,
11]. Although the development of invasive BMIs has made steady progress and holds promise for future clinical use [
4,
13], the invasive BMIs that are currently available are limited in the extent to which their accuracy and efficiency can be controlled. As has been described in previous papers [
8,
9], it is possible to indicate some technical factors affecting the limited performance of current BMIs. However, previous studies [
5,
14] have also emphasized that improvements in the technical factors, alone, cannot solve all of the problems that hinder the development of an ideal BMI, i.e., a system controlling external neuroprosthetic devices without any special training. The ideal BMI requires rich and precise information that depends on brain activity and function. Therefore, as some researchers [
15,
16,
17], including the present authors [
18,
19], have discussed, knowledge of what the brain is and how it works—the ultimate goals of neuroscience research—is essential for BMI research.
Regarding the ultimate goals of neuroscience, BMI research is particularly related to research on brain information coding [
15,
18]. Namely, if the neuronal activity that codes information in the brain can be detected, then the activity could possibly be used to move machines with thoughts alone. However, at the present time, the information coding in the brain is not completely understood [
20], and the ability to accurately detect such activity is not currently possible. Consequently, animals and humans must learn how to change their neural activity to be able to skilfully operate machinery, thereby enabling operation of machines with BMI.
Additionally, operating machines with BMIs is performed to achieve some type of goal. If that goal is achieved, or if the machine can be operated skillfully, then this functions as a reward and will enhance the changed neural activity through reinforcement feedback (
Figure 1). In other words, BMI changes the brain activity itself to enable the acquisition of a reward by directly operating a machine without using the physical body [
19]. In fact, based on the research to date, the brain activity has been changed by moving a robotic arm to gain access to food, or by receiving a reward of juice after hitting a target with a screen cursor using a brain connected to BMI [
21]. Accurate device control by BMIs inevitably requires the neuronal activity to be volitionally modulated for device control, and the brain can respond to the request for activity modulation. Such BMI-induced changes in neuronal activity are not restricted to the regions from which the signals used for device control are recorded. Koralek et al. [
11,
22] investigated the role of corticostriatal plasticity, usually involved in learning physical skills, in abstract skill learning with a BMI using motor cortical neurons. During the learning period of BMI control, altered activity in the striatal neurons was observed, and strong correlations, reflected in oscillatory coupling, between neuronal activity in the motor cortex and the striatum emerged. The authors concluded that temporally precise coherence develops specifically in motor output-related neuronal populations during learning and that the oscillatory activity serves to synchronize widespread brain networks to produce appropriate behaviors.
In this way, the core process of BMIs is “changing your own neuronal activity to gain a reward”, and this process is, in fact, operant conditioning of neuronal activity (neural operant conditioning) [
23]. Research on neural operant conditioning using animal experiments began in the 1960s by Fetz [
24], the pioneer in this field. This research is currently known as “neural biofeedback” [
23] and is often called “neurofeedback” [
25] when targeting human brain activity.
After introducing operant conditioning as a core element in biological learning, e.g., biofeedback or behavioral therapy, the present review elaborates on the role of operant conditioning in the context of BMI control based on volitional changes of neuronal activity, including firing rates and synchrony of neuronal populations. We also briefly discuss the sustainability of conditioned changes in neuronal activity for long-lasting reliability of BMIs and the possibilities and limitations of the applicability of current BMIs for people who need it most. A more detailed introduction and discussion of the present issue from viewpoints of neuroscience can be found in our previous papers [
18,
19].
2. Biofeedback and Operant Conditioning
Operant conditioning is considered a core mechanism in psychotherapy, e.g., cognitive behavioral therapy (CBT) [
26], and in biofeedback [
27]. Biofeedback, which has been widely used to relax the mind and body and reduce anxiety, is the operant conditioning of inner physiological responses, such as heart rate, blood pressure, body temperature, to change them in a desired direction. Birbaumer et al. [
28] introduced and discussed the historical background of this method and its present significance for BMI research. According to them, it had previously been thought that classical conditioning and operant conditioning were mutually exclusive, the former modulated autonomic functions responsible for regulating internal conditions, such as digestive reactions, heart rate or glandular reactions, and the latter acted on externally oriented behaviors involving the skeletal system, but a challenging experiment by Miller [
29] that operantly-conditioned visceral and glandular responses was in part responsible for the rapid development of the field of biofeedback and neurofeedback (p. 5 in [
28]). Though the failure of replicating operant conditioning in the curarized rat [
30] (this issue is discussed again in the Chapter 6) partly restrained the new field of operant conditioning and voluntary control of physiological functions, the emergence of BMIs has revived this research tradition of operant conditioning without knowledge or reference to the history of biofeedback research (p. 5 in [
28]). (also see [
1] for a review). The exact knowledge of biofeedback and operant conditioning, therefore, is indispensable and, again, required for current and future development of BMI research.
The basic method of biofeedback is to convert changes in the autonomic activities, such as heart rate, blood pressure, and body temperature, to a visual or audible signal (feedback signal) that can be perceived by the person and to present these signals to him/her (
Figure 2). Next, the person is instructed to only generate more of the feedback signals, with no attention to their autonomic body responses. If the feedback signals are increased as per the instructions, the sense of achievement in reaching that goal becomes the reward, and the feedback signals are again increased. At the same time, the physiological responses generating the signals are changing, so by repeating the procedure it becomes possible to intentionally change physiological activities in the desired direction, which normally cannot be intentionally changed. For example, it becomes possible for a person to reduce his/her own heart rate and blood pressure, or to increase his/her own body temperature [
27]. It is also possible to utilize biofeedback using brain waves as the physiological response; this technique is also used in psychotherapy to relax the mind and body. For example, by targeting the alpha waves in the brain it is possible for a person to increase the frequency of the occurrence of his/her alpha waves and achieve a relaxed state. Recently, with the reduced size and cost of electroencephalographs and computers that convert brain waves into feedback signals, the development of biofeedback capable of changing specific brainwave components in desired directions (increasing or decreasing) is flourishing, with neurofeedback, in particular, garnering much attention.
Having an understanding of operant conditioning is absolutely essential to understanding biofeedback and neurofeedback. Operant conditioning is a learning method that is ubiquitous in psychology textbooks. It is an experimental procedure that changes the frequency of a response by providing specific stimuli immediately after a spontaneous response by an animal (including humans) [
31]. The spontaneous response to be conditioned is an operant response or operant behavior, and the stimuli, such as a reward, given immediately after the response is called a positive reinforcer (or simply a reinforcer), and the operation of providing the reinforcer is called reinforcement. The basics of operant conditioning is setting up and manipulating the operant response–reinforcer relationship (contingency of reinforcement). For example, to make a rat respond by pressing a lever, a contingency of reinforcement is formed between the operant response (pressing the lever) and the reinforcer (the food); the rat is trained to respond in this manner. The key procedure to progressing with more efficient training is to give the reinforcer immediately after the operant response (immediacy of reinforcement). Biofeedback and neurofeedback share this same principle with neural operant conditioning. When the physiological responses and brain activities change in the desired direction, it is essential to notify those changes using a feedback signal; in humans, presentation of the feedback signal acts as the positive reinforcer, namely, the feedback signal is the reward.
Stimuli that reduce the frequency of responses by being given immediately after the operant response are known as punishment rather than reinforcer. If an electric shock is given as punishment after a lever is pressed, the rat will, of course, no longer press the lever. However, if punishment is used, both humans and animals will avoid the training altogether, so it is preferable to not use punishment unless necessary for a specific aim. Indeed, if the aim is to reduce the operant response, it is common to use procedures that will absolutely not enhance the responses when generated. This process is called extinction. Consequently, the basics of operant conditioning are to use an appropriate combination of reinforcement and extinction to voluntarily increase or decrease certain responses. In order to ensure more effective progression of operant conditioning, it is also important to understand the schedule of giving the reinforcer (schedules of reinforcement), the successive approximation method, and shaping [
31].
3. Operant Conditioning of Neural Activity
To understand the efficacy and future possibilities of neurofeedback, it is vital to know the extent to which the brain’s neurological activity is changed with operant conditioning. Animal experiments are essential for this kind of neurological research and, in fact, neural operant conditioning research using animals is steadily becoming more prolific and developments are being related to BMI research. Neural operant conditioning experiments using animals started approximately 50 years ago, whereby the activity of single neurons was targeted in experiments conducted by Fetz [
24], as introduced earlier. He recorded the activity of a single neuron in the motor cortex of a monkey for nearly an hour. During that time the firing rate of the neuron increased when the monkey was given a reward (reinforcement) while the neuron was firing, and returned to the original firing rate when the rewards were withheld (extinction). Fetz simultaneously recorded the activity of two neurons in close proximity, and found that if the reward was given to the monkey when only one of the neurons was firing, the firing rate of only that neuron would immediately increase, while the activity of the nearby neuron remained unchanged. In addition, if the reward was given to the monkey as the firing rate was decreasing, the firing rate would immediately decrease (in this instance, the reduction in the firing rate was “increased” by reinforcement). In this way, it became apparent that the animal itself can change its own brain activity to obtain a reward, much like physical behavior, even at the level of individual neurons, which are the constituent elements of the brain. Currently, research of neural operant conditioning is growing throughout the world. Arduin et al. [
32] recorded multiple neurons from motor cortical areas in rats for controlling a linear actuator with a water bottle. To receive the reward of water, the rats had to move the bottle until it reached a zone for drinking by raising and maintaining the firing rate of each neuron above a high threshold. The firing rates of conditioned neurons increased instantaneously after a trial onset and the bottle entered the drinking zone within a very short time. Furthermore, the conditioned neurons fired more frequently, instantaneously, and strongly than the neighboring neurons that were simultaneously recorded around the conditioned neurons (
Figure 3). The authors concluded that only the operantly-conditioned neurons possessing significantly increased firing rates took the lead as “master neurons”, which exhibited the most prominent volitionally-driven modulations in a small neural network. Engelhard et al. [
33] successfully conditioned volitional enhancement of oscillatory activity in the monkey motor cortex by targeting the motor cortex local field potential (LFP) (
Figure 4) (LFP is a summation of electrical signals of excitatory and inhibitory synaptic potentials from a large number of neurons neighboring the recording site. The characteristics of LFP waveforms depend on the proportional contribution of the multiple potentials and various properties of the brain tissue). This study also confirmed that the enhancement of oscillatory activity was not associated with any observed movements or increases in muscle activity.
4. Operant Conditioning of Firing Rate and Firing Synchrony in Neuronal Populations
We conducted a neuronal operant conditioning experiment using rats [
34], focusing on the hippocampus, which is deeply involved in learning and memory and the motor cortex, which has functions directly related to behavior. As neuronal operant conditioning is a method of learning, we hypothesized that there must be more significant changes in the neurons of the hippocampus. As a result, we found that operant conditioning increases not only the firing rate but also firing synchrony, namely, with the appearance of synchronized spikes of different neurons within the hippocampal neuronal population over a short timeframe.
The device used for the behavioral objectives was simply an operant box, where the rat poked its nose into a hole opened on the wall of the box (nose-poke response) and a food pellet came out as a reward. A number of “dodecatrodes” [
32] had already been surgically implanted into the rats’ hippocampus. Dodecatrodes are electrodes made up of a bundle of 12 tungsten wires, each of which has a diameter of 12.5 microns, which are able to detect the activity of multiple neurons in the vicinity of the electrode. This device enables recording of isolated individual neurons in near real time by processing the data with fast independent component analysis (Fast ICA). The firing synchrony between neurons can also be detected accurately and in real time, even if, for example, two spike waveforms overlap [
35].
Initially, we trained the rats to obtain the reward with the nose-poke response, namely, physical behavior (reinforcement of behavior). For the rats this was a simple training procedure and they were able to constantly obtain the reward in approximately 30 min (see
Figure 5, Session 1). Next, we hid the hole where the rats poked their noses through, and instead gave a reward when the firing rate of the hippocampal neuronal population exceeded a certain level (reinforcement of firing rate). In other words, we trained the rats using neural operant conditioning, where the rats obtained rewards by activating their neuronal population. Immediately after starting this training, the rats demonstrated a variety of behaviors (including running around), but after a while they stopped unnecessary movements and the food pellets began to emerge constantly. The rats were, therefore, able to actively fire their hippocampal neuronal population. Within approximately 30 min of starting the experiment almost all of the rats were able to obtain more rewards through neuronal population activity than through the nose poke behavior (see
Figure 5, Session 2). Lastly, we set up the experiment such that rewards would only be produced when there was synchronous firing of a neuronal population (reinforcement of firing synchrony). A food pellet was delivered when the recorded group of multiple neurons showed firing synchrony above a given criterion. The presence of firing synchrony to be rewarded was detected within a 2–4 ms time window and determined on the basis of the firing of all the pairs of neurons, which had been recorded simultaneously by the dodecatrode in each rat. The criterion to identify the presence of firing synchrony was determined individually for each rat (each dodecatrode) by selecting the width of the time window and setting the minimum quantity of neuron pairs showing firing synchrony within the selected time window (e.g., more than two neuron pairs showing firing synchrony within a 3-ms time window), ensuring that the number of spontaneous operant behavioral and synchrony responses before conditioning would be nearby identical. When we performed this conditioning, the rats were able to steadily obtain a constant supply of food pellets within approximately 30 min, as expected (see
Figure 5, Session 3). The rats were able to synchronously fire multiple neurons within a single population. In addition, such conditioned enhancement of firing synchrony remained in the following three days (see
Figure 5, Sessions 4–6).
The results of investigating the activity of neuronal populations after isolating the activity of individual neurons during this neural operant conditioning period is shown in
Figure 6. The firing rate of the overall population, which was comprised of those five neurons, and the firing synchrony within the population were both enhanced during the neural operant conditioning. These results show that the neurons temporarily increase their firing in much the same way as during operant behavior, and it does not increase the firing of all the neurons in the reinforced population. Furthermore, a similar analysis was conducted in the activity of a neuronal population recorded from another dodecatrode separated by 0.5 mm from the dodecatrode that recorded the neuronal population being operantly conditioned. That neuronal population was only recorded together with the neuronal population presented in
Figure 5 and had not been conditioned, and there was not a single neuron in which the activity had changed during neuronal operant conditioning. Therefore, the firing rate increased only in neurons within the conditioned population (or in the vicinity of that population), and these changes in activity did not spread over a wide area of the brain. Therefore we ascertained that it is possible to enhance the activity of local neuronal populations in isolation. We also investigated the firing synchrony between all pairs created among the five neurons within the conditioned population (
Figure 7). These results showed that there were pairs that fired synchronously and pairs that did not fire synchronously throughout the entire conditioning period. However, when the firing synchrony was reinforced, the number of pairs that fired synchronously increased.
5. Significance of Firing Synchrony in Neuronal Populations
The operant conditioning of firing synchrony of multiple neurons, shown in the study described above, is closely related to enhancing brain functions, most of which are realized by ensemble activities of populations of neurons that are functionally connected with each other. Such a functional population of neurons has been proposed to be “cell assembly” [
36], postulated to act as a functional unit that represents information in the working brain [
37,
38].
However, operant conditioning of cell assembly activity is not an easy task because the ranges in the patterns of cell assembly activation, i.e., sizes of cell assemblies, are thought to be diverse [
39]. A cell assembly could be comprised of a small number of localized neurons or a large number of broadly distributed neurons [
39]. Neurons in the neocortices and the limbic structures are expected to show various forms of firing synchrony, which represent dynamic and diverse representation by cell assemblies. Therefore, the diversity in the sizes of cell assemblies should be considered when neuronal operant conditioning is applied to enhance synchronized neuronal activity. Our previous study described above [
34] demonstrated the operantly-enhanced firing synchrony of the small and localized groups of neighboring neurons, which has been shown to be valid for some information processes in higher cortical regions [
40]. On the other hand, the study by Engelhard et al. [
33] succeeded in operantly enhancing the activity of broader cell assemblies as reflected by oscillatory low-gamma waves of LFP, which are produced by synchronized postsynaptic potentials of many neurons in broader ranges. Oscillatory activity in the motor cortex has been observed in many experiments and has led to various hypotheses about its possible functions, such as motor preparation and attention to aspects of movement [
41,
42].
Discussion is still ongoing regarding the actual functional role of oscillatory and synchronous activities of groups of neurons. But with neuronal operant conditioning, as Fetz [
43] suggested, those activities become the independent variable in the experiments, and their effects on behavior are more compelling evidence of their functions. Indeed, Keizer et al. [
44] have shown that volitionally-increased gamma oscillation in occipital and frontal sites in humans improved performance on cognitive tests of sensory binding and memory. These results support the notion that various information processes are generated by oscillatory activity in the cortices.
6. Problems in Neural Operant Conditioning for BMI Development
One serious problem of neural operant conditioning, which uses a small number of neurons for developing high-precision BMIs, is their limited stability as a source of signals to control a neuroprosthesis. In addition to such a technical problem, it should be made clear how long conditioned changes of neuronal activity can be retained. This problem is related to one of the major and difficult issues in psychology, i.e., the sustainability of learning, but it should be investigated as it is relevant for long-term reliability of BMIs. Although our previous study [
34] reported that the conditional enhancement of firing synchrony was retained for more than three days (
Figure 5), no experiment of neuronal operant conditioning has examined the limited stability of conditioned changes of neuronal activity. In this regard, transfer of operantly-conditioned firings between different neuronal groups is profitable to compensate the limited stability of source signals and conditioned activity changes. Additional studies in neuroscience (i.e., such as those described in [
45]) are required to test the possibility of the transfer of conditioned firings in the brain.
In order to improve the reliability of BMIs, recent noteworthy studies have suggested that electric brain stimulation, namely, transcranial direct current stimulation (tDCS), can enhance Hebbian learning of an abstract skill, such as controlling a neuroprosthetic device. In this regard, Soekadar et al. [
46] reported that learning to control over sensorimotor rhythms (SMR, 8–15 Hz) was superior in the group that received 20 min of anodal tDCS over the primary motor cortex (M1). The newly acquired skill in the anodal tDCS group remained superior even one month later. Their results indicate that the application of tDCS can modulate the processes of learning to control brain oscillatory activity for a long time period, and such paradigms will contribute to improving the reliability and stability of BMIs.
Another problem originated from the fact mentioned in the Chapter 2, that is, the failure of replicating operant conditioning in the curarized rat [
30]. As Birbaumer et al. [
28] stated, recent studies of invasive BMIs in animals and humans steer mainly toward the restoration of their motor functions to partially overcome handicaps, and less toward dealing with problems due to complete locked-in state (CLIS). The mechanistic and theoretical reasons for failure or difficulty in BMIs in CLIS should be investigated. For instance, Birbaumer et al. [
47] hypothesized that “loss of the contingency between a voluntary response and its feedback” or “loss of subsequent reward” in individuals who are completely paralyzed would prevent learning even if afferent input and cognitive processing (attention, memory, and imagery) remained intact. Similarly, the reasons for the failure of neural operant conditioning (neurofeedback) in the curarized rat should be experimentally and theoretically investigated.