A Multi-Channel Low-Power System-on-Chip for in Vivo Recording and Wireless Transmission of Neural Spikes

This paper reports a multi-channel neural spike recording system-on-chip with digital data compression and wireless telemetry. The circuit consists of 16 amplifiers, an analog time-division multiplexer, a single 8 bit analog-to-digital converter, a digital signal compression unit and a wireless transmitter. Although only 16 amplifiers are integrated in our current die version, the whole system is designed to work with 64, demonstrating the feasibility of a digital processing and narrowband wireless transmission of 64 neural recording channels. Compression of the raw data is achieved by detecting the action potentials (APs) and storing 20 samples for each spike waveform. This compression method retains sufficiently high data quality to allow for single neuron identification (spike sorting). The 400 MHz transmitter employs a Manchester-Coded Frequency Shift Keying (MC-FSK) modulator with low modulation index. In this way, a 1.25 Mbit/s data rate is delivered within a limited band of about 3 MHz. The chip is realized in a 0.35 μm AMS CMOS process featuring a 3 V power supply with an area of 3.1 × 2.7 mm. The achieved transmission range is over 10 m with an overall power consumption for 64 channels of 17.2 mW. This figure translates into a power budget of 269 μW per channel, in line with published results but allowing a larger transmission distance and more efficient bandwidth occupation of the wireless link. The integrated circuit was mounted on a small and light board to be used J. Low Power Electron. Appl. 2012, 2 212 during neuroscience experiments with freely-behaving rats. Powered by 2 AAA batteries, the system can continuously work for more than 100 hours allowing for long-lasting neural spike recordings.


Introduction
There is a growing need for wireless implantable neural recording systems that can simultaneously acquire neural signals from a large number of channels.Recording from hundreds of neurons may allow neurophysiologists to study brain function in freely-behaving animals, and in future will provide the signals necessary to control neural prosthetic devices [1].Compared to wired systems, wireless neural recording devices provide a number of benefits: reduction of motion artifacts and freedom of animal movements in neuroscience experiments, as well as reduced infection risks and higher patient mobility in neuroprosthetic applications.However, these systems entail severe technical challenges, such as the robustness of the wireless link and the need of a large data transmission band, coupled to low-power dissipation and small size in case of brain-implantable devices.The main issue is related to the huge amount of data associated with a large number of recording channels: in a 64 channel system featuring a 20 kHz sampling frequency (the neural signal band extends to about 7-10 kHz) with 8 bit resolution, the required data rate of the wireless link is 10.24 Mbit/s.Achieving such a high bit rate is not straightforward either in fully implantable devices where power consumption is limited to about 10 mW to avoid tissue necrosis [2] and in battery-powered systems for maintaining a reasonable battery life.Hence, most integrated circuits (ICs) reported in the literature that simultaneously support neural recording and wireless telemetry must trade-off power consumption against the amount of data to be wirelessly sent.
Data compression was adopted to solve this problem in earlier systems.For example, since in a typical extracellular electrode trace most of the useful information is contained in spikes (or action potentials, APs), Harrison et al. [3] enable only the transmission of the spike occurrence information, achieving a low bit rate of 330 kbit/s for 100 channels.However, since the examination of the raw traces may improve the selection of the spike detection threshold, the system reported in [4] allows to collect raw data from 2 out of 64 channels present in the system, while only spike occurrence times are sent for the remaining 62 channels via a 2 Mbit/s wireless link.On the other hand, spike waveforms are used in neuroprosthetic applications to assign spikes to specific neurons improving the prediction of an intended movement [5].Hence, during the last four years, wireless systems capable of sending all raw data have been reported.In [6], Chae et al. describe a 128 channel system able to transmit all raw data using an impulse-radio ultra-wide band (IR-UWB) transmitter with up to 90 Mbit/s rate.In such system, problems may arise from skull and skin absorption at high frequencies (signal spectrum is in the 3.5-4.5GHz range) and from pulse distortion and the need of an external broadband antenna.As a matter of fact, no in vivo tests are reported for this kind of system.In [7], Ghovanloo et al. report a system that transmits raw data from 32 channels, encoded in a pulse-width modulated (PWM) signal via a 640 kS/s frequency-shift keying (FSK) wireless link.In this system, the transmitted information relies on the time width of each symbol, making this transmission prone to the noise of the wireless link and of the receiver.Moreover, the system in [7] requires a relatively large bandwidth (38 MHz), which means higher probability of RF interferences, and does not meet the band requirements of any medical communication standard.
We propose here a solution that is intermediate to the two extremes described above.The idea was to employ a data compression algorithm in order to: • preserve the information needed for single neuron identification; • reduce the throughput and thus the power consumption, since the power needed to the antenna is directly proportional to the bit rate [8]; • keep the bandwidth limited to few MHz, thus reducing the probability of RF interference and verifying the possibility to make in the future the system compliant to Medical Implanted and Communication Service (MICS) or Industrial Scientific and Medical (ISM) bands, in the 402-405 MHz and 902-928 MHz frequency range, respectively.
In particular, having reduced the data rate, it was possible to extend the transmission distance well beyond the 1 m value reported in the above-mentioned works to make possible realistic in vivo experiments with a small penalty in terms of power consumption.The proposed integrated circuit, already presented in [9], has been included in a complete system, made of a transmission and a receiving unit, that is in use in a neuroscience laboratory on freely-behaving animals in place of commercial wired systems to avoid tethering effects and to provide electrical isolation.In this paper we describes in detail the architecture and circuit design choices, the electrical performance of the IC and the in vivo tests on small-laboratory animals.
The paper is organized as follows.Section 2 describes the overall system made of a signal conditioning and transmitting unit and a receiver built with off-the-shelf components.Section 3 discusses in detail the implementation of the system-on-chip (SoC) building blocks and validates the data compression algorithm by means of ad-hoc simulations, while Section 4 shows the electrical characterization of the integrated circuit and reports the results of in vivo tests on freely-behaving laboratory animals.Finally, a comparison of the implemented SoC with the state-of-the-art is presented in Section 5.

System Architecture
The system consists of a wireless recording unit, a receiver built with off-the-shelf modules to maximize its sensitivity plus a remote host complete of a graphical user interface (GUI) to allow neural signal visualization during in vivo experiments.

Wireless Recording Unit
Figure 1 shows the block diagram of the IC which acts as recording and transmitting module.The circuit consists of 16 low-noise amplifiers (LNAs), which perform low-power and low-noise amplification of neural signals.The amplifier outputs are sampled by a time-division multiplexer (TDM) at 20 kS/s onto one data lead before being further amplified by a variable-gain amplifier (VGA) and converted by an 8 bit successive approximation register (SAR) analog to digital converter (ADC).The digital data are then processed by a logic block, performing data compression, assembly and Manchester encoding of the bit stream.Finally, the digital signal is sent to a 400 MHz binary FSK wireless transmitter based on direct modulation of a Voltage Controlled Oscillator (VCO), enabling a 1.25 Mbit/s data rate.A crystal Clapp oscillator employing an external 20 MHz quartz and a power-on/reset (POR) circuit complete the system, providing a clean 20 MHz frequency clock and a synchronization signal at the start-up.It is worth pointing out that, to reduce silicon area and costs, only 16 amplifiers have been integrated.The remaining 48 channels have been added by connecting a voltage reference to the multiplexer input.In this way all the stages following the amplifier array are designed to process signals from a 64 channel front-end, thus making it possible to reliably assess the actual performance of a complete 64 channel system.

Receiver and Graphical User Interface
The receiver (see Figure 2) consists of a quadrature zero-IF down-converter (MAX3580) using an LNA with a noise figure of 4.7 dB at 400 MHz, RF and baseband tunable filters and automatic gain control circuits.The quadrature signals are converted by a dual channel, 20 MSps, 10 bit ADC (AD9201) and fed to a Xilinx FPGA module (OpalKelly, XEM3005), which performs frequency demodulation and Manchester decoding and sends the received data to a PC via a USB link.In addition, a Graphical User Interface (GUI) based on Labview/C software was implemented to allow data saving and on-line elaboration.

Analog Front-End
Amplification and filtering of the input signal are performed by a three-stage circuit shown in Figure 3.The first stage is an AC-coupled high-pass filter, using two MOS-bipolar pseudo-resistors as feedback elements [10], which enables the synthesis of high-value resistances without using large-area components.The mid-band amplifier gain is given by G 1 ∼ = −C 1 /C 2 , set to −67 by taking C 1 = 10 pF and C 2 = 150 fF.The high-pass pole frequency is placed below 10 Hz to reject the offset and the slow voltage drift of the electrode.A tunable G M -C high-pass filter with a cut-off frequency f HP of about 300 Hz is introduced after the first stage.It is designed to reject the low-frequency signals, such as Local Field Potentials (LFPs) in the 1−100 Hz frequency range that can prevent a correct detection of the neural spikes or even saturate the amplifier, and cuts off the input-referred 1/f 2 noise due to the pseudo-resistors of the first amplification stage.After the selective high-pass filter, a second non-inverting gain stage is added to provide further signal amplification and to define the high frequency cut-off, f LP .The stage is a single-ended capacitive-coupled voltage amplifier with a gain G 2 ≈ 33, achieved by using C 3 = 5 pF and C 4 = 150 fF.A capacitive-coupled structure was preferred to a purely resistive feedback amplifier to minimize the current drawn by the OTA output stage.The DC voltage at the amplifier input is determined by the pseudo-resistor elements in the feedback path, while the low-pass cut-off frequency is set by the gain-bandwidth product of the operational amplifier (GBW P 2 ) to about GBW P 2 /G 2 ∼ = 10 kHz.Note that at very low frequencies this non-inverting stage has a unity gain in order not to amplify the offset of the operational amplifier.

Noise Analysis and First Stage Sizing
In order to minimize the amplifier array power consumption while maintaining the input-referred noise specifications, a rigorous noise analysis of the preamplifier and filter has to be carried out following the approach adopted in [10].The noise performance of the whole LNA depends on the design of the first-stage OTA, whose input noise power density is mainly determined by a compromise between its thermal and flicker noise.To explain this point, let us denote as C p the input parasitic capacitance to ground of each input terminals of the first-stage OTA [11].If we neglect the noise contribution due to the two pseudo-resistors (this term will be addressed in a next step), we can write the input-referred noise of the overall amplifier as: where E 2 n eq,OT A is the input-referred voltage noise of the first-stage OTA.Equation (1) highlights that C p , which is mainly determined by the input transistor dimensions, cannot be increased too much without compromising the overall input-referred noise.On the other hand, it is well known that large-area input transistors are required to minimize the OTA flicker noise.Therefore, to achieve the best overall performance these two contradictory requirements have to be balanced.Furthermore, according to Equation (1), the first stage gain, G 1 , can be increased maximizing C 1 .Its value is limited by the electrode impedance, whose capacitive component is in the 160-320 pF range.The choice of C 1 = 10 pF, C 2 = 150 fF, thus G 1 ≈ −67, results in an upper limit of 0.5 pF for the parasitic capacitance C p .which guarantees sufficient margin to keep under control the 1/f noise at a cost of a marginal 10% increase in the overall amplifier noise with respect to the ideal OTA [see Equation (1)].In such a design, E 2 neq and E 2 n eq,OT A are almost the same; therefore, both terms will be denoted with the same symbol, E 2 neq .
Table 1.Transistor operating points.To minimize the noise, the first OTA stage has been designed as a simple telescopic cascode amplifier (see Figure 3).This well-known configuration guarantees excellent noise performance thanks to its few transistors, while cascoding and proper transistor sizing ensure enough gain.Straightforward analysis relates the input-referred noise power spectral density to the noise sources of the transistors, leading to: The result has been derived by taking into account that the transistors M bias and M cas do not significantly contribute to the overall noise.To minimize the noise, the condition g mp g mn must be fulfilled, i.e., (W/L) p (W/L) n .Under this assumption the total noise power density results: with γ equal to 2/3 for transistors working in strong inversion or 1/(2κ) in weak-inversion (κ ≈ 0.7) [12].In order to minimize the thermal noise without increasing the current consumption, the M p transistors have to work in the weak inversion region, where the transconductance is maximum.Its value can be estimated from the EKV model [13], valid in all regions of inversion: U T and IC being the thermal voltage and the inversion coefficient [12], respectively.The latter parameter is defined as: and is much less than 1 in weak-inversion or subthreshold region.¿From Equation (4), the overall-amplifier input-referred thermal noise power spectral density results: with I bias being the bias current of the telescopic cascode amplifier.Assuming a first-order roll-off of the frequency response, the noise bandwidth is π 2 • 10 kHz ∼ = 15.7 kHz.For an upper limit of 3 µV rms for the thermal noise contribution in this noise band, a minimal I bias value of about 3 µA is required; to be conservative, we set I bias = 4 µA .
With regard to the flicker noise, its contribution can be reduced by increasing the PMOS transistor area (W p × L p ). Considering that for the adopted technology K 2 and setting a noise corner frequency lower than 100 Hz, the input PMOS transistors were sized with W p × L p = 400 × 1 µm 2 .This sizing results into a stray OTA input capacitance C p of approximately 500 fF, which does not excessively impair the equivalent input noise.Moreover, in order to fulfill the requirement (W/L) p (W/L) n , the settings W n = 5 µm and L n = 40 µm were applied, thus forcing these devices to work in the strong inversion region (see Table 1).For this choice of the transistor lengths, cascoding of the input differential pair is needed to not to degrade the output resistance and, thus, to not to lower the amplifier gain.The introduction of M cas guarantees an output resistance R out = r 0n ||(g mcas r 0cas r 0p ) ≈ r 0n and an amplifier gain g mp r 0n ≈ 75 dB.
In summary, by careful transistor sizing, the noise within the amplifier band may be reduced to the sole thermal noise.Since in a generic amplifier a trade-off exists between noise and current consumption, its efficiency may be evaluated referring to the noise efficiency factor (N EF ), which is a parameter commonly adopted in literature, proposed in [14] and defined as: where V in,rms is the input-referred rms noise, I tot is the total supply current and BW is the bandwidth of the amplifier.In this limit, the minimum N EF achievable with a simple differential stage can be estimated from Equations ( 6) and ( 7) leading to a value of √ 2/κ = 2.02 [15].However, in our design the contribution to the thermal noise of the current mirror transistors cannot be completely neglected.In fact, even if M p transistors work in weak-inversion and M n FETs are biased in strong inversion region, the transconductance ratio is only about 7 (see Table 1).Considering the general expression of the transconductance given by Equation ( 4) and the total input-referred thermal noise given by the first two terms of Equation ( 2), the theoretical N EF for this preamplifier becomes: where γ ≈ 2/3, κ ≈ 0.7 and IC n is the inversion coefficient of current mirror transistors.In reality, the N EF of the proposed amplifier is slightly larger compared to this theoretical limit for three main reasons: • The transconductance of the input transistors is slightly lower than κI D /U T since their inversion coefficient is larger than zero.For the present design, the input transistor IC is 0.075 (see Table 1), and consequently the rms input noise and the N EF increase by a factor of 1.034.
• The input-referred noise of the overall amplifier is larger by a factor (1 + 1/G 1 + C p /C 1 ) 2 than the one of the first operational amplifier, as stated by Equation ( 1).Taking into account that the parasitic input capacitance is approximately 0.5 pF, mainly due to the OTA input transistors, this factor is equal to 1.065.A further contribution derives from the strays associated with the input capacitor plates that was drastically reduced by connecting the capacitor bottom plate (which has the largest parasitism) to the amplifier input and by connecting the top plate to the OTA terminal.
In this way, a parasitic capacitance larger than 1.5 pF was avoided.
• The current drawn by the second amplifying stage contributes to the total current in Equation ( 7) but does not lower the input-referred thermal noise.Therefore, the N EF increases by a factor of (1 + I 2 /I 1 ) ∼ = 1.037,where I 1 and I 2 are the currents drawn by the first and the second operational amplifier respectively.
These three factors increase the estimated N EF of the overall amplifier to about 2.45.Finally, note that the adoption of a telescopic cascode amplifier, known to have a small output dynamic, is not a limiting factor for the input signal amplitude.In fact, the positive voltage swing is ≈ 450 mV (|V T,p | + V cas ) while the negative swing is about 1 V (1.5 V − V ov,n ), thus assuring a maximum amplitude for the input signal of about 7 mV.

Second Amplifying Stage
A second gain stage is needed to increase the amplitude level of the input signal before multiplexing.Since this stage requires a relative large output swing, a two-stages OTA can be employed.Provided that the first stage has enough gain, the impact of this second amplifier on the input-referred noise is negligible, and its power dissipation can be reduced without affecting the noise performance.In fact, the noise of the second op-amp is dominated by the input transistors, designed to operate in weak-inversion region.If we limit the contribution to the equivalent input noise added by the second stage OTA to less than 0.3 µV rms (i.e., 1/10 of the dominant contribution), a minimum current of 100 nA is needed in the input differential pair of the second op-amp.To be conservative, we set the bias current to 200 nA in the first stage, while a current of 100 nA is drawn by the second stage.
Finally, nonlinear distortion may be of some concern in the second stage.Distortion arises from the non-linear high-resistance pseudo-resistors placed in the feedback path, which is driven by a large output voltage swing.For this reason the subthreshold MOS transistors were sized to have resistance values an order of magnitude higher than those in the first stage.This choice makes the signal current flowing through them always several orders of magnitudes lower than the signal through the capacitors, limiting the measured Total Harmonic Distortion (THD) generated by the stage below 5% even if the pseudo-resistors were modulated by a 1.5 V peak-to-peak 1 kHz sinusoid.

High-Pass Filter Design and Optimization
Let us now consider the noise due to the MOS-bipolar pseudo-resistors that was neglected in Equation (2).It was experimentally verified that the noise spectral density of the pseudo-resistors complies with the usual equation: where R is the small signal incremental resistance.This noise contribution to the OTA input-referred noise power spectral density follows a 1/f 2 law and is given by: where f 1 is the cut-off frequency of the high-pass filter set by the first amplifier stage.The factor of 2 in Equation ( 10) accounts for the two pseudo-resistors, one in the feedback path and one connected to the positive input terminal of the operational amplifier.For simplicity, let us now neglect the presence of the selective high-pass filter following the first stage.By integrating Equation ( 10) from f 1 to the low-pass cut-off frequency of the overall amplifier, f LP , we get: which is independent of the cut-off frequency f 1 .From Equation ( 11) the contribution of the pseudoresistors to the OTA input-referred noise is about 2.79 µV rms .This value is not negligible and explains why it is not convenient to implement the selective high-pass filter (i.e., the 300 Hz cut-off frequency high-pass filter) in the first stage, as this solution would degrade the noise performance.The G M -C high-pass filter is instead able to cut off most of the noise from the pseudo-resistors of the first stage, but a careful choice of the filter capacitor is needed to reduce its noise contribution.We will discuss this point in detail.Let us now suppose that f 1 << f HP , where f HP is the cut-off frequency of the G M -C high-pass filter.For frequency above f 1 , the input-referred power spectral density due to both the pseudo-resistors of the first stage and to the G M -C filter can be written as: where S 2 nI is the output current noise of the high-pass filter.For simplicity, let us assume that the current noise generated by the G M cell (G M = 1/R HP ) can be written as 4kT /R HP .Equation (12) becomes: By integrating Equation ( 13) over the amplifier band, i.e., from f HP to f LP , we obtain: The result suggests to reduce the first term by minimizing the f 1 /f HP ratio while a large capacitor value C HP is needed to reduce the second term.In practice, the G M cell is a simple differential stage (see Figure 3) whose bias current can be externally tuned.Because a small bias current (I ∼ = 1 nA) is needed to synthesize a high value resistance, all the transistors work in weak-inversion region and their transconductance is G M = g m1 ∼ = κI/U T .The current noise of this configuration is therefore: The current noise of the G M cell is therefore a factor of 2/κ larger than that of a resistor R HP and this factor has to be added in the second term of Equation ( 14).In summary, in order to keep the input-referred noise due to the first stage pseudo-resistors and to the G M cell in the amplifier band lower than 1 µV rms , f 1 has to be at least a factor of 10 smaller than f HP and C HP must be greater than 2 pF.The two pseudo-resistors in the first stage of the circuit in Figure 3 were designed to have f 1 < 10 Hz (with f HP = 300 Hz) and C HP = 7 pF.

Line Buffer and Multiplexer
A circular shift register, controlled by a 625 kHz clock (ck mux in Figure 1), sequentially enables each amplifier to access the common data lead; each pass gate switches on both the rising and the falling edge of the multiplexer clock, avoiding clock transitions in the middle of the sampling window.As in this topology each amplifier has to drive a long routing line connecting all the pass-gate outputs, a class-AB buffer is needed to boost the line.To save power, each buffer is switched off when the corresponding amplifier is not selected and turned on when the previous amplifier is selected (i.e., one clock edge before).In this way, at any time, only two buffers are powered, draining about 25 µA each from the supply.Further amplification is provided before sampling by a variable-gain amplifier (VGA) added to the amplification chain at the multiplexer output.The gain can be digitally varied between 1 to 8 by means of two auxiliary control bits.

Analog to Digital Converter
The multiplexed and amplified signal is then converted into a digital form by an 8 bit SAR ADC (see Figure 1).This topology is well-suited for low power and small-area applications since it requires a minimal amount of analog circuitry.To save area, analog multiplexing of input signals, requiring only one AD converter, was employed even though it resulted in a small power penalty with respect to the use of one converter per each amplifier and digital multiplexing of data [16].
The number of the converter bits was chosen taking into account the following considerations: • The maximum amplitude for an extra cellular action potential is ∼ 1 mV [17] while the minimum signal is about 10 µV, the latter being determined by the typical input noise due to neural background activity and electrode impedance [18].Therefore, the ratio between the Full Scale Range (FSR) and the ADC least significant bit, i.e., the dynamic range of the converter, should be better than 1 mV/10 µV ≈ 100.This results in a converter resolution larger than 6.35 bit.
• The ADC quantization noise has to be kept much lower than the minimum detectable signal, i.e., the rms input noise.This requirements translates into: where G is the amplifier gain and n is the number of converter bits.The worst-case condition occurs for the minimum gain of the amplification chain, which is set by the ratio between the FSR and the maximum amplitude of the input signal, A max : In the present design we set G min ∼ = 2000 and thus: In practice, taking into account that the effective number of bit (ENOB) is always lower than the nominal figure, the use of an 8 bit ADC will provide a safe margin for the design.A higher resolution is useless and harmful since it would increase the ADC power consumption.
The ADC consists of three main parts: the capacitor array, the comparator and the logic block performing the successive-approximation algorithm.The capacitor array, which acts as sampling capacitance during the sampling interval and as charge-sharing DAC during the conversion period, has been implemented as a binary array.The value of the unit capacitance, C LSB , was chosen considering two requirements: • The noise in the sampling phase has to be significantly smaller than the quantization noise.Thus, the total capacitance of the array, C T OT , must satisfy the condition: which clearly is not a limit factor, resulting in C T OT much larger than about 1 fF.
• The LSB capacitance has to be sufficiently accurate and it has to fulfill the following requirement [19]: where K c = 0.5% • µm and A c are the Pelgrom mismatch parameter and the least significant bit (LSB) capacitor area, respectively.¿From Equation ( 20) A c has to be larger than (4.5 µm) 2 , which implies C LSB > 19.5 fF, considering a specific capacitance of 1 fF/µm 2 .
Taking into account also the effect of parasitic capacitance that can degrade the achievable accuracy, C LSB = 80 fF was employed as unit capacitor, resulting in a total capacitance of 2 N • C LSB = 20.48pF.
The comparator was implemented as a classical two-stage topology, as depicted in Figure 4.A 20 dB gain preamplifier is followed by a PMOS latch stage and a couple of inverters are connected at the latch outputs to regenerate the signals.The two-stage architecture was preferred to a dynamic latch in order to avoid kickback noise and metastability [20,21].
The successive approximation logic has been implemented as a synchronous full-custom logic.The A/D converter works with a clock frequency of 20 MHz, providing 1.25 MS/s, i.e., about 64 channels × 20 kS/s per channel.Hence, a conversion period lasts 16 clock cycles (see Figure 5): The sampling phase lasts 7 clock periods to relax the specs of the input signal buffer, while every bit decision takes one clock period, starting from the most significant bit (M SB).The last clock cycle is reserved for the end-of-count (EOC) operations.The current consumption of input buffer and A/D converter are 250 and 410 µA, respectively.

Digital Signal Processing
The implemented data reduction system takes advantage of the low duty cycle of the neural activity to eliminate transmission of useless signal containing only noise.Spike duration and firing rates are typically lower than 1 ms and 100 Hz [22], respectively.A finite-state machine (FSM in Figure 6) compares each incoming digital sample with a user-programmable digital threshold, i.e., an 8 bit word stored in the embedded SRAM.When the threshold is crossed, 20 samples of the signal from the same channel are recorded, covering a 1 ms time frame (see Figure 7).This strategy was chosen taking into account that a clear identification of APs requires the detection of three features, namely the amplitude of the peak and the trough and the time interval between them [22], which can be extracted with enough resolution by using 20 samples, even for the fastest spikes.The threshold can be set by the user before the recording starts, choosing among four predefined digital words: "x1000000", "x0100000", "x0010000" and "x0001000" (corresponding to 1/2, 1/4, 1/8 and 1/16 of half the ADC full scale range).Since the spike can have a first peak either positive or negative and the baseline is at the middle of the ADC range, the threshold is either positive and negative for each sample.For example, considering the first threshold, the FSM is triggered if the incoming sample is larger than "11000000" or lower than "01000000".If the gain of the amplification chain is set to the maximum value (83 dB ∼ = 14000), these four thresholds correspond to an input-referred amplitude of ± 54 µV, ± 27 µV, ± 13 µV and ± 7 µV.The recorded samples are stored in a 2 kbit embedded memory (SRAM in Figure 6) together with the channel address and the timing stamp, both with a 8 bit resolution.Each spike is then compressed in 22 bytes.The SRAM is continuously read at 1 Mbit/s rate by the control logic block, which adds service bits for synchronization at the receiver and then performs Manchester encoding of the bit stream.The final data rate reaching the transmitter is 1.25 Mbit/s.The memory read speed was determined by taking into account two aspects: • the lower the read speed, the higher the compression factor is; 1 Mbit/s read speed results in a compression factor of 10 (from 10.24 Mbit/s, corresponding to 64 channels sampled at 20 kHz per channel and 8 bit per sample, to 1.25 Mbit/s data rate) and allows saving a considerable amount of power and bandwidth at the transmitter side; • considering the worst-case scenario of 64 channels firing at 100 spike/s and 20 samples per spike with an 8 bit resolution, the average data throughput is about 1 Mbit/s.
The memory was sized to reduce missed spikes when burst activity is present.Monte-Carlo simulations were performed considering that the neural firing rate follows a Poisson statistics.Figure 8 shows the missing spike percentage as a function of the average firing rate for each of the 64 channels and for different memory sizes.Opting for 1 Mbit/s read speed and 2 kbit RAM size, the missing spike percentage is less than 0.1% for 64 channels firing at 50 spikes/s and less than 10% for all channels firing at 100 spikes/s.With these choices, the memory is able to store spikes that occur in the same 1 ms frame in 11 consecutive channels before missing a spike.Each AP is transmitted, once completely stored, in 176 µs and thus the worst-case latency time (time difference between the detection and the transmission of an AP) is about 2.9 ms, which is not an issue both in neuroscience experiments and in future prosthetic applications [23].The validity of the data-reduction algorithm was tested by running a custom clustering software that employed fuzzy c-means spike sorter [24,25] on a data set from a public source [26] with spike waveforms hardly distinguishable one from each other (see Figure 9).The ratio between the peak-to-peak spike amplitude and the rms noise was approximately 10. Clustering based on Principal Component Analysis (PCA) was performed on original data containing spikes from three different neurons and on the same data but reduced according to the above windowing strategy.Figure 9 shows that clustering on the reduced data set leads to clear separation of the three spike families, with identification errors reported in Table 2. Two types of error occur in a PCA-based clustering: type I errors occur when APs from two different neurons are grouped together (false positives), whilst type II errors occur when not all APs generated by one neuron are grouped together (false negatives).When applied to the original dataset, the PCA-based identification procedure leads to type I/type II mean error rates of 4.1%-5.2%,respectively.The reduced data sets showed 4.6%-5.3%values, thus demonstrating that the reduction strategy preserves the quality needed for effective neuron identification.These results are not surprising considering that, as stated in [22], more than 80% of neural spike information is carried by three analog features of the AP waveform, namely the peak and the through amplitudes and the time between them.The results of spike sorting using these analog features [27] and performed on the same trace are also reported in Table 2.Note that the compression method employed in this work always outperforms the clustering obtained using only these three features.Finally, it is worth nothing that the current consumption of the overall digital signal processing is limited to about 400 µA, most of which consumed by the finite-state machine.

Manchester-Coded FSK Modulator
The 1.25 Mbit/s bit stream is transmitted using a Manchester-coded binary Frequency Shift Keying (MC-2FSK) modulation at a carrier frequency of 400 MHz.The transmitter (see Figure 10) consists of a voltage-controlled oscillator (VCO) directly modulated by the digital data, inserted in a Phased-Locked Loop (PLL).Since the modulator is a power-hungry system, the design of the whole PLL, and in particular of the high-frequency oscillator, has to be carried out with great care.
Figure 10.Schematic of the Phased-Locked Loop adopted as frequency synthesizer and modulator.The Manchester-coded data stream directly modulates the VCO.The Power Amplifier adopts an external resonant filter to efficiently deliver power to the antenna allowing long transmission range.The oscillator has a current-biased topology with double cross-coupled pair transconductor.This oscillator architecture allows to minimize the current consumption for a given phase noise with respect to the topology featuring a single differential pair transconductor [28].In order to accurately set the PLL output frequency at the desired value of 400 MHz and to compensate for process, temperature and power-supply variations, the oscillator has to span over ±10% of its central frequency, i.e., a band of about ±40 MHz.This means that the capacitance has to be varied as: where f 0 is the central oscillation frequency (400 MHz), ∆f 0 the covered band of 80 MHz and C the tank capacitor.To make the oscillation frequency insensitive to the parasitic capacitance of the active devices, routing, pads and package pins (the inductor is off-chip), the tank capacitance has to be set in the pF range.Thus, we designed the oscillator with C tunable from 6 pF to 11 pF, determining an inductance value of about 20 nH.The tank capacitance is implemented by three digitally-controlled binary-weighted NMOS inversion-mode varactors that perform coarse tuning of the oscillation frequency and an analog NMOS inversion-mode varactor for fine-tuning of the frequency, enabling to close the loop of the PLL.
Another small NMOS varactor is also adopted for frequency modulation.
Once the tank parameters have been determined, the current I BIAS of the tail generator and the (W/L) ratio of the pair transistors have to be chosen taking into account the following considerations: • The small-signal loop gain or excess gain [29] of the oscillator (g m R T AN K , where g m is the small-signal transconductance of the double differential-pair and R T AN K is the tank equivalent parallel resistance) has to be larger than 1 in order to assure the oscillation start-up.Since the common-mode output voltage of the oscillator is set to the half of the power supply to maximize the oscillation swing, i.e., V DD /2 = 1.5 V, we have: corresponding to a current larger than ∼ = 220 µA.Note that in Equation ( 22) it was assumed that the overall transconductance g m of the double differential-pair is equal to the single MOS transconductance and that NMOS and PMOS transistors have the same overdrive voltage, • The differential oscillation amplitude, A 0 , has to be sufficiently large (>1.5 V) to drive the subsequent stages, i.e., the frequency dividers and the power amplify.Since where Q is the tank quality factor [30], the current I BIAS is minimized adopting an inductor with the maximum quality factor.We opted for an off-chip inductor of 21 nH featuring a Q of about 70 at 400 MHz.This choice results in a current larger than 320 µA to achieve the desired voltage amplitude.
• The VCO phase noise, which mainly determines the phase noise of the whole synthesizer at a frequency offset larger than the PLL loop bandwidth, has to be sufficiently lower in order not to worsen the Signal-to-Noise-Ratio (SN R) of the modulated signal.This can be evaluated as Single-Sideband to Carrier Ratio, i.e., as the ratio of the power in a 1 Hz bandwidth at an offset ω m from the fundamental angular frequency ω 0 and the power of the carrier, giving [31]: where F is the noise factor of the transconductor (F ∼ = 1).From behavioral simulations, the phase noise at 1 MHz offset from the carrier has to be lower than −120 dBc/Hz to not to degrade the frequency modulated signal.
• A resonant filter (at twice the oscillation frequency) connected to the source of the NMOS differential pair transistors (see Figures 10 and 11) can be added to reduce flicker noise up-conversion into phase noise, which can severely limit the VCO performance [29].
For I BIAS = 400 µA, C in the 6-11 pF range and Q ∼ = 70, the excess gain is approximately 2 and the oscillation amplitude is larger than 2 V, preventing the tail current generator to enter the ohmic region and avoiding a phase noise degradation.Moreover, these choices ensure that the phase noise is not an issue being better than 1-50 dBc/Hz at 1 MHz frequency offset.
The VCO is followed by a cascade of a divide-by-2 dynamic logic prescaler and a static modulo-10 CMOS counter, which further divides the frequency by 10.The PLL reference is the same 20 MHz master clock, allowing the loop to set the oscillation frequency at 20 times the reference clock, i.e., 400 MHz.The PLL also includes a phase-frequency detector with a charge-pump driving an RC-C off-chip filter (see Figure 10), which sets the loop bandwidth to 50 kHz.Excluding the VCO, the power consumption is mainly due to the high-frequency prescaler, which draws more than 200 µA.The current consumption of the whole synthesizer is about 700 µA.
The synthesizer frequency is modulated in an open-loop mode by the data at 1.25 MHz, i.e., at 1.25 Mbit/s rate with Manchester encoding.Thus, the loop is open for the frequency modulation since the PLL bandwidth is significantly lower than the bit-rate.This prevents the PLL to filter out the induced frequency modulation.The only purpose of the loop is to fix the central oscillation frequency in order to avoid frequency drift due to temperature and power supply variations.To limit the occupied bandwidth, a frequency deviation of about ±400 kHz was adopted.Considering a modulation frequency of 1.25 MHz, this value corresponds to a modulation index of 0.64.The bandwidth of the transmitted signal is determined by the Carson rule [32]: where ∆f and f m are the peak frequency deviation (400 kHz) and the frequency modulation (1.25 MHz), respectively.

Power Amplifier
The transmitter is completed by a class-AB power amplifier (PA) that drives a 50-Ω antenna via an off-chip resonant filter.The PA consists of an open-drain PMOS transistor directly connected to the oscillator output, sized in order to draw about 3.5 mA.Thus, the power delivered to the antenna is about 0 dBm considering a reasonable efficiency of 10%.The output power, and thus the power consumption of the PA, were set after estimating the sensitivity of the receiver, which depends on the modulation type, the signal bandwidth and the noise factor of the receiver itself.The receiver input noise in the signal bandwidth of 3 MHz is: where F is the noise factor of the receiver, which was assumed to be equal to 10 dB in the above equation.For a low modulation index Manchester-encoded modulation, the SN R required at the receiver input to ensure a bit error rate (BER) lower than 10 −5 is SN R min ∼ = 20 dB [33].Thus the input power needed at the receiver is: Once the receiver sensitivity has been computed, it is possible to set the PA power to achieve a desired transmission range.Taking into account the Friis transmission equation, the ratio of power available at the input of the receiving antenna, P R , to the output power of the transmitting antenna, P T , is: where G T and G R are the gains with respect to an isotropic radiator of the transmitting and receiving antennas, respectively, λ is the wavelength, and d is the distance between the antennas.While the receiver antenna is a dipole antenna and its gain can be estimated to be 0 dB, the transmitter antenna is a whip antenna with a gain that can be as low as −20 dB.Note that Equation (28) represents the power at the receiver antenna that is supposed to be matched with the receiver.Thus, the receiver input power is 3 dB lower.Thus, considering Equation ( 28) and setting the maximum transmission distance to 10 m, the power needed at the transmitter antenna is 11 dBm, where a 3 dB factor accounting for the matching between the receiver antenna and the receiver itself is considered.Finally, to have a safety margin, a PA featuring a 0 dBm output power was designed.The chip was fabricated in AMS 0.35 µm, 3 V 4M2P CMOS process and occupies a total area of 3.1 × 2.7 mm 2 , pads included (see Figure 11).The preamplifier has a mid-band gain of about 65 dB, a high frequency cut-off of 10.5 kHz and a low-frequency cut-off tunable from 1 Hz to 1 kHz (see Figure 12).The input-referred noise, for f HP = 300 Hz, is 3.05 µV, while the current consumption is 4 µA for the first stage, 0.001 µA and 0.3 µA for the G M − C high-pass filter and the second amplifying stage respectively, resulting in an overall N EF of 2.5.The input-referred power spectral density is shown in Figure 13 for different values of the G M -C filter high-pass corner frequency.The plateau of −152 dBV 2 /Hz corresponds to an input-referred noise of about (25 nV) 2 /Hz, close to the expected value.Note that, for f HP = 300 Hz, the 1/f 2 input-referred noise is mainly due to the G M -C high-pass filter, as predicted by Equation ( 14).The common-mode rejection ratio (CM RR) and the power supply rejection ratio (P SRR) of the overall pre-amplifier are higher than 65 dB and 50 dB, respectively, while the cross-talk between adjacent channels due to the analog multiplexing is lower than −40 dB.With regard to the AD converter, Figure 14 shows the results of static measurements.The converter features a DNL less than 0.45 LSB and an INL less than 1 LSB. Figure 15 shows the output spectrum of the ADC for a full-scale sine wave at 1 kHz frequency as input signal.The signal to noise plus distortion ratio (SNDR) is close to 45 dB, and the number of effective bit (ENOB) is thus about 7.2.Considering the modulator, the frequency range covered by the VCO spans from 310 to 460 MHz (see Figure 16), allowing to lock the PLL loop to a frequency of 400 MHz frequency even with a large spread of tank component values and for sizable variations in power supply and temperature.When no data modulation is performed, the modulator phase noise (see Figure 17) is less than −123 dBc/Hz at 1 MHz offset from the carrier.The phase noise spectrum has a typical low-pass shape, with 50 kHz cut-off frequency, since it is completely dominated by the charge-pump noise [34], while the VCO phase noise is much lower.The rms integral frequency noise of the modulator in the 10 Hz to 10 kHz band is less than 3 kHz (see the residual frequency FM in Figure 17), whereas the output spectrum of the synthesizer when acting as frequency modulator is depicted in Figure 18: more than 98% of the signal power is included in a 3 MHz band.For a λ/4 whip antenna at the transmitter and a dipole antenna at the receiver side, the received power in a free space is shown in Figure 19.The sensitivity of the receiver has been measured to be −74 dBm for a BER of 10 −5 , close to the estimated theoretical value predicted by Equation ( 27) of −79 dBm, allowing a transmission range larger than 30 m.The system has also been successfully tested for a 10 m distance between transmitter and receiver in a hostile environment as a neuroscience laboratory, giving a PA efficiency of about 7% while delivering a −2 dBm at a 50-Ω output load.Disabling the PA, the tank inductor may act as transmitting antenna; in such measurements about −70 dBm was received by a loop antenna placed at approximately 5 cm from the tank inductor.

Electrical Characterization
The overall performance of the implemented system is provided in Table 3.The power consumption of the presented circuit has been computed by projecting the performance to a complete 64 channel system.To this end, the additional budget needed for the missing 48 amplifiers has been added to the actual power consumption (16.6 mW) leading to 17.2 mW (6.7 mW with the PA turned off).The estimate is fair since all the other circuit blocks work with the same data rate of a full 64 channel system.Figure 19.Received power as a function of the distance between transmitter and receiver antenna in free-space.A dipole antenna was adopted at the receiver side.

In Vivo Experiments
Figure 20 shows the system mounted on a rat for in vivo signal acquisition.It consists of two parts, a wireless neural recording headstage and a backpack.To limit disturbances collected from the environment, a small board (3 × 3 cm 2 and 4.5 g of weight) is directly connected to a 16 channel microelectrode array (Tucker-Davis), with an impedance in the 20-60 kΩ range at 1 kHz, implanted into the somatosensory cortex of an adult rat.The board includes the packaged chip and 10 external small surface-mount components (0805 or 0603 footprint): the VCO inductor, the power-amplifier resonant filter components (2 capacitors and 1 inductor), the 20 MHz quartz, the PLL loop-filter (1 resistor and 2 capacitors) and 2 decoupling capacitors between power supplies.The antenna is a quarter-wavelength whip antenna: it consists of a piece of wire about 17 cm long, easily placed along the back of the rat.The backpack has a weight of 40 g and includes two AAA batteries with 1000 mA/h capacity that may allow neural activity recordings for more than 100 hours.The two batteries are connected via three 10 cm long wires to the headstage.At the beginning of the in vivo experiment, the quality of the signals from the implanted electrodes was analyzed using a commercial acquisition system by recording the raw signals and noise from the same 16 electrode channels.The recorded traces featured a noise of about 10 µV rms on all channels.The threshold of the digital peak processor was set to "x1000000" and the gain of the overall amplifying chain to the maximum value (83 dB) since the peak-to-peak spike amplitude was always lower than 100 µV.Thus, the digital threshold corresponds to an input-referred amplitude of about ±30 µV, i.e., ±3 times the rms noise.The high-pass cut-off frequency of the front-end amplifiers was set to about 300 Hz to properly reject the low-frequency signals and the power-line noise, enabling correct spike detection.The receiver dipole antenna has been placed at two meters from the rat while neural activity was recorded.Figure 21 shows a single trace recorded during the experiment and two spike waveforms extracted from the data stream, while in Figure 22

Discussion and Conclusions
Table 4 compares the implemented system to other wireless neural recording ICs with respect to the three most important features, i.e., the transmitted data type (raw data, spike detection or spike waveform), the transmission range and the power consumption per channel.For a fair comparison, since state-of-the-art wireless neural recording systems feature a transmission range always lower than 1 m, the proposed circuit can be thought sized to get a similar transmission range.Since the power needed at the antenna is inversely proportional to the square of the transmission distance, the PA power consumption could be scaled down by a factor of 900, i.e., (30 m/1 m) 2 .In this case the equivalent power consumption of the whole system reduces to about 6.7 mW, corresponding to less than 105 µW per channel.By comparing our system with the one described in [3], working in the same frequency range, the proposed system features a lower power consumption per channel.Also the data quality is greatly improved since spike detection is replaced by waveform detection, making possible single spike identification over a large microelectrode array.Compared with the IC in [7], our system shows an improvement of a factor larger than 2 in terms of power consumption per channel.It is true that the data rate does not allow all raw data acquisition as in [7], but our design optimizes information transfer and is potentially compliant with the narrowband spectral limitations in the Medical Implant Communication Service (M ICS) band range (402-405 MHz).In the future, this system could also be designed to work in the Industrial Scientific and Medical (ISM ) band (902-928 MHz).
The system in [6] is better in term of power per channel but, relying on an UWB transmitter, it can suffer from interference.Moreover, this system has never been tested in in vivo experiments and no information on the adopted antenna or on the power delivered by the PA is given.Regarding the system presented in [35], its power consumption per channel (80 µW/ch) was evaluated considering the minimum transmitter power, assuming that it allows a 1 m transmission distance.However, the data rate is limited to 1.5 Mbit/s, translating into data acquisition at a sampling rate <3 kS/s from all 64 channels.Higher sampling rates are possible only for a limited number of channels: assuming that 20 kS/s is the minimum sampling rate to acquire good quality neural signals, the maximum number of channels that can be recorded and transmitted simultaneously is only 9.
In conclusion, thanks to the implemented compression algorithm that allows to preserve the information needed for single neuron identification while limiting the data throughput and the bandwidth, we achieved a good balance between power consumption and quality of transmitted data and transmission range.In addition, the power consumption per channel was lowered by a careful sizing of each individual circuit block.

Figure 1 .
Figure 1.Block diagram of the wireless neural recording IC.M U X

Figure 2 .
Figure 2. Block diagram of the receiver (a) and photo of the remote receiver with the detail of the graphical user interface (b).

Figure 3 .
Figure 3. Schematic of the low-noise band-pass neural pre-amplifier.

Figure 4 .Figure 5 .
Figure 4. Schematic of the two-stage comparator.A preamplifier is followed by a cascoded semi-latch to avoid kickback noise and metastability.

Figure 6 .
Figure 6.Block diagram of Digital Signal Processing.

Figure 7 .
Figure 7.Data compression algorithm: when a spike is detected, 20 digital samples are stored in the embedded memory, together with channel address and timing stamp information.

Figure 8 .
Figure 8. Missing spike percentage as a function of the average firing rate per channel for different memory sizes and for a read speed of 1 Mbit/s.
Average firing rate per each channel [Hz]Missing spikes[%]

Figure 9 .
Figure 9. Waveforms of neural spikes and PCA-based clustering results employing the proposed windowing strategy, i.e., processing 20 samples after the threshold crossing.

Figure 11 .
Figure 11.Photo of the board adopted during the neuroscience experiments with rats and microphoto of the die.

Figure 12 .
Figure 12.Measured transfer function for different high-pass corner frequencies set by the G M -C filter.

Figure 13 .
Figure 13.Measured input-referred noise for different high pass-corner frequency.

Figure 14 .Figure 15 .
Figure 14.Measured DNL and INL of the integrated AD converter.

Figure 16 .
Figure 16.Oscillation frequency varying the analog tuning voltage and for different coarse tuning bit configuration.

Figure 17 .
Figure 17.Measured phase noise of the phased-locked loop when the modulation is off.

Figure 18 .
Figure 18.Spectrum of the modulated signal.
Figure20shows the system mounted on a rat for in vivo signal acquisition.It consists of two parts, a wireless neural recording headstage and a backpack.To limit disturbances collected from the environment, a small board (3 × 3 cm 2 and 4.5 g of weight) is directly connected to a 16 channel microelectrode array (Tucker-Davis), with an impedance in the 20-60 kΩ range at 1 kHz, implanted into the somatosensory cortex of an adult rat.The board includes the packaged chip and 10 external small surface-mount components (0805 or 0603 footprint): the VCO inductor, the power-amplifier resonant filter components (2 capacitors and 1 inductor), the 20 MHz quartz, the PLL loop-filter (1 resistor and 2 capacitors) and 2 decoupling capacitors between power supplies.The antenna is a quarter-wavelength whip antenna: it consists of a piece of wire about 17 cm long, easily placed along the back of the rat.The backpack has a weight of 40 g and includes two AAA batteries with 1000 mA/h capacity that may allow neural activity recordings for more than 100 hours.The two batteries are connected via three 10 cm long wires to the headstage.At the beginning of the in vivo experiment, the quality of the signals from the implanted electrodes was analyzed using a commercial acquisition system by recording the raw signals and noise from the same 16 electrode channels.The recorded traces featured a noise of about 10 µV rms on all channels.The threshold of the digital peak processor was set to "x1000000" and the gain of the overall amplifying chain to the maximum value (83 dB) since the peak-to-peak spike amplitude was always lower than 100 µV.Thus, the digital threshold corresponds to an input-referred amplitude of about ±30 µV, i.e., ±3 times the rms noise.The high-pass cut-off frequency of the front-end amplifiers was set to about 300 Hz to properly reject the low-frequency signals and the power-line noise, enabling correct spike detection.The receiver dipole antenna has been placed at two meters from the rat while neural activity was recorded.Figure21shows a single trace recorded during the experiment and two spike waveforms extracted from the data stream, while in Figure22a large number of spikes recorded on the same channel are shown aligned in time.

Figure 20 .
Figure 20.Arrangement for in vivo measurements.Note the headstage, the battery backpack and the whip antenna placed along the back of the animal.

Figure 21 .Figure 22 .
Figure 21.Example of a reconstructed trace registered in an in vivo experiment.

Table 4 .
Comparison of wireless neural recording ICs.