An Area- and Energy-Efﬁcient 16-Channel, AC-Coupled Neural Recording Analog Frontend for High-Density Multichannel Neural Recordings

: We present an AC-coupled modular 16-channel analog frontend with 1.774 fJ/c-s · mm 2 energy- and area-product for a multichannel recording of broadband neural signals including local ﬁeld potentials (LFPs) and extracellular action potentials (EAPs). To achieve such a small area-and energy-product, we employed an operational transconductance ampliﬁer (OTA) with local positive feedback, instead of a widely-used folded cascode OTA (FC-OTA) or current mirror OTA for conventional neural recordings, while optimizing the design parameters affecting performance, power, and area trade-offs. In addition, a second pole was strategically introduced in the LNA to reduce the noise bandwidth without an in-channel low-pass ﬁlter. Compared to conventional works, the presented method shows better performance in terms of noise, power, and area usages. The performance of the fabricated 16-channel analog frontend is fully characterized in a benchtop and an in vitro setup. The 16-channel frontend embraces LFPs and EAPs with 4.27 µ V rms input referred noise (0.5–10 kHz) and 53.17 dB dynamic range, consuming 3.44 µ W and 0.012 mm 2 per channel. The channel ﬁgure of merit (FoM) of the prototype is 147.87 fJ/c-s and the energy-area FoM (E-A FoM) is 1.774 fJ/c-s · mm 2 . works, the presented prototype shows better performance in terms of noise, power, and area usages. The prototype fabricated in 180 nm 1P6M CMOS process consumes 3.44 µ W and 0.012 mm 2 per channel while achieving the IRN of 4.27 µ Vrms with a channel and E-A FoM of 147.87 fJ/c − s and 1.77 fJ/c − s · mm 2 , respectively.


Introduction
An in-depth understanding of the brain's activities will require large-scale recordings from multiple neuronal structures. For such large population recordings, extracellular neural recording has been recognized as one of the most powerful techniques due to its high spatial and temporal resolutions, thus, related research tools for extracellular neural recording have been steadily advanced [1][2][3]. The requirements for extracellular neural recording are high-density and high-quality signal acquisitions without unnecessary interventions for long time periods, which poses huge challenges to the engineering works for it. Those challenges when focusing on the integrated circuit design for neural recording frontends can be enumerated as follows: considering the dense population of neurons in the brain (e.g., >1000 neurons within a radius of 140 µm in the rat cortex [1]), multichannel neural recording frontend circuits that fit into a small area of the brain are highly required; due to the tiny amplitude of extracellular neural signals (~100 µV) and their high dynamic range (DR) (~60 dB), decent quality recording must be provided [4]; low-power operation of the neural recording frontends is essential because heat dissipation from the highdensity neural recording frontend circuit can negatively affect living issues. As such, the requirements are not easy to achieve and are even harder when it is needed to achieve the requirements are not easy to achieve and are even harder when it is needed to achieve them simultaneously since they are interrelated in the context of integrated circuit design trade-offs [5]. To break the trade-offs, a lot of research into the neural recording frontend circuit design have been conducted [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. Some works have accomplished an ultralowpower operation [18], others have realized the frontend circuit within a tiny area [8,19], and, recently, a few works have tried to achieve both simultaneously [9,11].
To quantitatively measure the performance of the neural recording frontend circuits, the authors of [9,10] proposed the energy-area figure of merit (E-A FoM) where the per channel energy efficiency (Channel FoM) and area consumptions of a neural recording frontend are multiplied, elucidating how efficiently a given neural recording frontend circuit uses the area and energy to provide a certain performance (smaller is better). In addition to this FoM, by referencing a few important specifications, such as input referred noise (IRN), signal-to-noise (and distortion) ratio (SN(D)R), and bandwidths, one can easily appreciate a neural recording frontend circuit compared with other similar implementations. This work, for instance, exhibits a 1.774 fJ/c-s·mm 2 E-A FoM calculated by an energy efficiency of 147.87 fJ/c-s and an area of 0.0012 mm 2 per channel. The other performance indicates the channel dynamic range of 53.17 dB, IRN of 4.27 μVrms, 3.44 μW per channel power consumption with 31.25 kS/s sampling frequency. This is a decent result since it shows very small E-A FoM compared to other works (clarified later in Table 1) while satisfying the specifications for broadband neural recordings.
This work has been motivated by observing the performance of our prior works [22,23], as shown in the left of Figure 1, where a 256-channel CMOS frontend circuit (16 × 16-channel module) and the same channel flexible polyimide neural probe are combined to realize an ultrahigh-density 256-channel broadband neural recording system by using a hybrid interconnection with the anisotropic conductive film (ACF). In [22,23], to be flipchip bonded with a flexible neural probe, a low noise amplifier pixel (+buffer) was buried under the area pad with a tiny pitch of 75 μm, eliminating input pads in the chip perimeter. Though successful, the essential circuit performance, such as IRN and power consumption were compromised due to the restricted area. Thus, this work primarily focuses on how to achieve better IRN with smaller power consumption, while inheriting the same pixel pitch and reducing the area consumption further by introducing novel circuit design techniques. Figure 1. A high-density 256-channel neural recording system (left) where a 256-channel analog frontend circuit and flexible polyimide neural probe are hybrid-combined using anisotropic conductive film (ACF) [23]. A system where a commercial head-stage (RHD 2132, Intan Technology LLC, Los Angeles, CA, United States) and a 64-channel probe are assembled with two RHD 2132 chips is also shown for comparison. Figure 1. A high-density 256-channel neural recording system (left) where a 256-channel analog frontend circuit and flexible polyimide neural probe are hybrid-combined using anisotropic conductive film (ACF) [23]. A system where a commercial head-stage (RHD 2132, Intan Technology LLC, Los Angeles, CA, USA) and a 64-channel probe are assembled with two RHD 2132 chips is also shown for comparison.
Electronics 2021, 10, 1972 3 of 15 This paper is organized as follows: The overview of the AC-coupled 16-channel neural recording frontend and details of the area-aware design applied for the low noise amplifier pixel and its consequent benefits are provided in Section 2. The modification of the pixel structure to further save the area consumption is also given in the same section. Then, the measurement results from the benchtop and in vitro set-up, and the comparison of the measured performance with other recent state-of-the-art works are presented in Section 3. Finally, Section 4 concludes this paper.

A 16-Channel, AC-Coupled Analog Frontend
Analog frontend integrated circuit architectures for neural recordings to achieve a decent performance while maintaining the balance between power and area consumptions have, recently, been extensively researched. As summarized in [11], one can roughly classify these into the three distinct circuit architectures: (1) AC-coupled amplifier and analog-to-digital converter (ADC), (2) DC-coupled amplifier and ADC, and (3) DC-coupled direct ADC. The first architecture may be one of the most popular and widely adopted topologies [4,6,10,[24][25][26][27]. It provides a decent and balanced performance in terms of noise, gain uniformity, distortion, and so on, while consuming relatively high power and large area [6]. The DC-coupled amplifier and ADC architecture was introduced to reduce the area consumption by eliminating bulky input coupling capacitors in the AC-coupled architectures, but it needs the extra area and power for the mixed-signal feedback loop which it must have [8,19]. The DC-coupled direct ADC has been introduced most recently. The frontend ∆-modulation and switched capacitor-based sampling [28], and the differencedifferential input stage-based ∆-modulation [29] can achieve a rail-to-rail input range, but these cost a high oversampling rate, limiting their applications to low frequency neural signals, such as electroencephalogram (EEG), intracranial EEG, local field potential (LFP), and electrocorticogram (ECoG). In this work, the conventional AC-coupled architecture has been chosen to inherit its advantages over the newly developed architectures, such as excellent gain uniformity across the multiple channels, less design effort due to the well-developed design procedures, inherent passive offset cancellation, and relatively high input impedance, while compensating the high power and large area consumptions with the proposed novel circuit design techniques.

Integrated Circuit Architecture
Our 16-channel, modular and expandable architecture is shown in Figure 2. It consists of 17 low noise amplifiers (LNAs) and time-division multiplexing (TDM) buffers, a TDM switch array, a programmable gain amplifier (PGA), a successive approximation register (SAR) analog-to-digital converter (ADC), and a serial peripheral interface (SPI). All voltage and current references are internally generated. To save area, the pixel is implemented with single-ended input and output, resulting in half-size passives of the conventional AC-coupled structures [22,23]. The multiple input signals are coupled into the single-ended input and output LNAs, which does not have any common mode (CM) rejection, but this is compensated by the replica reference where the exact same LNA accepts the reference signal, as shown in Figure 2 (left). Thus, the input and reference paths have the same impedance (Z in = Z ref ), rejecting CM signals in a pseudo-differential manner. After the initial amplifications, the buffers drive the TDM switches and parasitics; then, the PGA provides further gain according to experimental environments. The SAR ADC quantizes the output of the PGA and loads the data in accordance with the format of the SPI.

Area-Aware Design in Low Noise Amplifier
The key innovations of this work lie in the pixel circuit design (an LNA + a buffer), which improves noise power performance with the given area of 75 μm 2 . The previous work in [22,23] also accomplished the same size pixel; however, it showed a relatively high IRN of ~6.3 μVrms due to the limited space while consuming a relatively large power consumption (~8.9 μW). In this work, we realized a smaller IRN and power consumption of the same size thanks to the area-aware design techniques and the adoption of an operational transconductance amplifier (OTA) with a local positive feedback.
A design methodology considering the area consumption in an AC-coupled LNA was introduced in [30]. While assuming that the area consumption is dominated by passive components in an AC-coupled amplifier, the area (A) required for a differential ACcoupled LNA is given by ( ) where Cin, Cf, and CL are the input, feedback, and load capacitance, respectively, Cd is the capacitor density in a chosen technology node (usually, 1 − 4 fF/μm 2 ), and AM is the midband gain of the AC-coupled amplifier (=Cin/Cf). In addition, the power of the input referred noise and noise bandwidth (NBW, Δf) are given as: , respectively, where k is the Boltzmann constant, T is the absolute temperature in Kelvin, and Gm is overall transconductance of the amplifier in use. Since CL affects Δf and the power of the input referred noise must be calculated using Δf, the two Equations (2)

Area-Aware Design in Low Noise Amplifier
The key innovations of this work lie in the pixel circuit design (an LNA + a buffer), which improves noise power performance with the given area of 75 µm 2 . The previous work in [22,23] also accomplished the same size pixel; however, it showed a relatively high IRN of~6.3 µV rms due to the limited space while consuming a relatively large power consumption (~8.9 µW). In this work, we realized a smaller IRN and power consumption of the same size thanks to the area-aware design techniques and the adoption of an operational transconductance amplifier (OTA) with a local positive feedback.
A design methodology considering the area consumption in an AC-coupled LNA was introduced in [30]. While assuming that the area consumption is dominated by passive components in an AC-coupled amplifier, the area (A) required for a differential AC-coupled LNA is given by where C in , C f , and C L are the input, feedback, and load capacitance, respectively, C d is the capacitor density in a chosen technology node (usually, 1 − 4 fF/µm 2 ), and A M is the mid-band gain of the AC-coupled amplifier (=C in /C f ). In addition, the power of the input referred noise and noise bandwidth (NBW, ∆f ) are given as: respectively, where k is the Boltzmann constant, T is the absolute temperature in Kelvin, and G m is overall transconductance of the amplifier in use. Since C L affects ∆f and the power of the input referred noise must be calculated using ∆f, the two Equations (2) and (3) are combined and then plugged into Equation (1) By using Equation (4) one can design an area-optimized, AC-coupled LNA while maintaining decent IRN performance. In general, this equation makes sense, however, it Electronics 2021, 10,1972 5 of 15 has limitations; this equation disregards one of the major noise sources in the AC-coupled LNA: flicker noise (or 1/f noise) and the noise multiplication (m = (C in + C f + C p )/C in , C p : input parasitic capacitance); thus, it results in a sub-optimal design. While it may be fine when there is enough space to implement large input transistors to minimize 1/f noise (if that is the case, the portion of the 1/f noise in the total IRN of the broadband neural recording amplifiers (1-10,000 Hz) is usually~10% [6]), it cannot be neglected in the designs where the optimal usages of the area and power is highly required, particularly when the given area is too small to allocate enough space for the input transistors. In addition, the absolute values of the passives, C in and C f , cannot be set large if the given area is small, therefore, m is easily and negatively affected by large C p . Including the above discussions and considering that the LNA has a single input and output as in our approach, as shown in Figure 3a, Equation (4) is modified as when there is enough space to implement large input transistors to minimize 1/f noise (if that is the case, the portion of the 1/f noise in the total IRN of the broadband neural recording amplifiers (1-10,000 Hz) is usually ~10% [6]), it cannot be neglected in the designs where the optimal usages of the area and power is highly required, particularly when the given area is too small to allocate enough space for the input transistors. In addition, the absolute values of the passives, Cin and Cf, cannot be set large if the given area is small, therefore, m is easily and negatively affected by large Cp. Including the above discussions and considering that the LNA has a single input and output as in our approach, as shown in Figure 3a, Equation (4) is modified as ( ) According to [30], Equation (1) tells us that the optimal IRN and the optimal AM (AMopt) would be given by varying AM, however, the suggested optimization process is not likely to produce the best result for the design for circuit blocks after the LNA because small AM incurs design penalties, such as higher noise or higher power consumption in the following blocks. Therefore, it seems to be more appropriate to fix AM. We set AM as 100 by assuming a smaller Cfb than that provided by the given process (~35 fF), which is realizable by using a capacitor T-network to some extent [25]. In addition, we calculated the amount of capacitance we can use in the given area. With the consideration of the given area of 75 μm 2 , the capacitor density (Cd = 2 fF/μm 2 ), and design rules of the selected 180 nm CMOS process, ~9 pF metal-insulator-metal (MIM) capacitance is readily available.  We then performed numerical simulations to observe the trade-offs given by Equation (5). Figure 4a shows the IRN and high frequency cutoff (fH, proportional to Δf) of an LNA by sweeping Cin with the given total usable capacitance (Cusable ~9 pF). For the simulation, a source degenerated folded cascode operational transconductance amplifier (FC- According to [30], Equation (1) tells us that the optimal IRN and the optimal A M (A Mopt ) would be given by varying A M , however, the suggested optimization process is not likely to produce the best result for the design for circuit blocks after the LNA because small A M incurs design penalties, such as higher noise or higher power consumption in the following blocks. Therefore, it seems to be more appropriate to fix A M . We set A M as 100 by assuming a smaller C fb than that provided by the given process (~35 fF), which is realizable by using a capacitor T-network to some extent [25]. In addition, we calculated the amount of capacitance we can use in the given area. With the consideration of the given area of 75 µm 2 , the capacitor density (C d = 2 fF/µm 2 ), and design rules of the selected 180 nm CMOS process,~9 pF metal-insulator-metal (MIM) capacitance is readily available.

Vout
We then performed numerical simulations to observe the trade-offs given by Equation (5). Figure 4a shows the IRN and high frequency cutoff (f H , proportional to ∆f ) of an LNA by sweeping C in with the given total usable capacitance (C usable~9 pF). For the simulation, a source degenerated folded cascode operational transconductance amplifier (FC-OTA) with a few µA of current consumption was selected for the openloop-amplifier [24,31]. As C in decreases, f H monotonically increases due to the smaller C L (C L = C usable − C in − C fb ), while the optimal IRN can be found due to larger noise bandwidth as C in increases (to the right), and by larger noise multiplication (m) as C in decreases (to the left). As opposed to a common belief in the tradeoff between power consumption and noise, a larger power consumption would not be helpful if the implementation area were fixed due to the increased NBW. Another numerical simulation in Figure 4b clearly shows this. This simulation predicts how much power must be burned to achieve the smallest noise with the given area constraint. In this case, C in becomes smaller as G m increases since we fixed f H as 10 kHz, thus, C L must be increased. The IRN increases as G m decreases, according to Equation (2); the opposite can happen due to a higher m by smaller C in . Without our modification in Equation (4), both gradient descent points in Figure 4 for the IRN may not be found. the given area constraint. In this case, Cin becomes smaller as Gm increases since we fixed fH as 10 kHz, thus, CL must be increased. The IRN increases as Gm decreases, according to Equation (2); the opposite can happen due to a higher m by smaller Cin. Without our modification in Equation (4), both gradient descent points in Figure 4 for the IRN may not be found.
As predicted in Figure 4, due to the fundamental relationships between Gm and fH (or Δf), assuming that the OTA is a first order system, i.e., the linear proportionality of fH to Gm, given by Equation (3), and between thermal noise power and Gm, the implementation of an AC-coupled neural recording amplifier with decent IRN (<5 μVrms) looks very hard to achieve with the given small area of 75 μm 2 . This is evident if recalling our previous LNA based on FC-OTA implementation [22] that was designed in accordance with the described design flow with the same size of 75 μm 2 and showed a slightly high IRN of > 6 μVrms despite a relatively high power consumption of ~8.9 μW (~7.4 μA current).  Based on the previous discussions, to achieve smaller IRN with the given implementation area, the fundamental relationship between Gm and Δf must be broken. In this paper, we adopted a different type of OTA, as shown in Figure 3b, whose Gm can be weakly coupled to IRN and Δf through the proper selection of the design parameters. The OTA in Figure 3b uses a local positive feedback with M5,6 and an additional current gain with M9,10 to boost the overall Gm where gm1 is the transconductance of the input transistor, M1,2, α and B are the ratios of the geometries of M6 to M3 and M3 to M9, respectively. The output referred noise (ORN) and IRN of the OTA are given as Input Capacitance (pF)  As predicted in Figure 4, due to the fundamental relationships between G m and f H (or ∆f ), assuming that the OTA is a first order system, i.e., the linear proportionality of f H to G m , given by Equation (3), and between thermal noise power and G m , the implementation of an AC-coupled neural recording amplifier with decent IRN (<5 µV rms ) looks very hard to achieve with the given small area of 75 µm 2 . This is evident if recalling our previous LNA based on FC-OTA implementation [22] that was designed in accordance with the described design flow with the same size of 75 µm 2 and showed a slightly high IRN of >6 µV rms despite a relatively high power consumption of~8.9 µW (~7.4 µA current).

IRN (μVrms)
Based on the previous discussions, to achieve smaller IRN with the given implementation area, the fundamental relationship between G m and ∆f must be broken. In this paper, we adopted a different type of OTA, as shown in Figure 3b, whose G m can be weakly coupled to IRN and ∆f through the proper selection of the design parameters. The OTA in Figure 3b uses a local positive feedback with M 5,6 and an additional current gain with M 9,10 to boost the overall G m where g m1 is the transconductance of the input transistor, M 1,2 , α and B are the ratios of the geometries of M 6 to M 3 and M 3 to M 9 , respectively. The output referred noise (ORN) and IRN of the OTA are given as where R out is the output impedance of the OTA and γ is a constant for the transistor channel noise. Unlike the conventional design of the chosen OTA where α < 1 and B > 1 for a stable positive feedback with G m boosting [32], we selected α < 1 and B < 1 to suppress the noise bandwidth, ∆f, while achieving decent noise performance. According to the last term in Equation (8), once B has been selected to be less than 1, B 2 inevitably increases IRN. However, in this case its effect would be minute, because g m7, 9 and g m9 are usually very small compared to g m1 to minimize power consumption and that no high slewing is required for neural recording applications. We also performed numerical simulations to demonstrate this. As shown in Figure 5a, the IRN is proportional to (1 − α) 0.5 . This relationship comes from the fact that the proportionalities in the power spectral density (PSD) of the IRN to (1 − α) 2 and f H , i.e., ∆f to (1 − α) −1 . Figure 5b depicts the effect of B for f H and IRN. The IRN quickly increases due to the proportionality of the PSD of the IRN to B −2 , according to (8) as B decreases. On the other hand, the IRN is also increased by the higher f H as B increases. Since the f H should be set not only by considering the IRN but also by the general specification for neural recordings, α = 0.7 and B = 0.1 were selected. Based on the parameters in Figure 5, we designed an AC-coupled neural recording amplifier within 75 µm 2 of area. Figure 6 shows the IRN and f H versus C in with the selected topology. It shows the similar optimal point in Figure 6, while showing the better noise performance of 4.47 µV rms .
channel noise. Unlike the conventional design of the chosen OTA where α < 1 and B > 1 for a stable positive feedback with Gm boosting [32], we selected α < 1 and B < 1 to suppress the noise bandwidth, Δf, while achieving decent noise performance. According to the last term in Equation (8), once B has been selected to be less than 1, B 2 inevitably increases IRN. However, in this case its effect would be minute, because gm7, 9 and gm9 are usually very small compared to gm1 to minimize power consumption and that no high slewing is required for neural recording applications. We also performed numerical simulations to demonstrate this. As shown in Figure 5a, the IRN is proportional to (1 − α) 0.5 . This relationship comes from the fact that the proportionalities in the power spectral density (PSD) of the IRN to (1 − α) 2 and fH, i.e., Δf to (1 − α) −1 . Figure 5b depicts the effect of B for fH and IRN. The IRN quickly increases due to the proportionality of the PSD of the IRN to B −2 , according to (8) as B decreases. On the other hand, the IRN is also increased by the higher fH as B increases. Since the fH should be set not only by considering the IRN but also by the general specification for neural recordings, α = 0.7 and B = 0.1 were selected. Based on the parameters in Figure 5, we designed an AC-coupled neural recording amplifier within 75 μm 2 of area. Figure 6 shows the IRN and fH versus Cin with the selected topology. It shows the similar optimal point in Figure 6, while showing the better noise performance of 4.47 μVrms.
(a) (b)  stable positive feedback with Gm boosting [32], we selected α < 1 and B < 1 to suppress the noise bandwidth, Δf, while achieving decent noise performance. According to the last term in Equation (8), once B has been selected to be less than 1, B 2 inevitably increases IRN. However, in this case its effect would be minute, because gm7,9 and gm9 are usually very small compared to gm1 to minimize power consumption and that no high slewing is required for neural recording applications. We also performed numerical simulations to demonstrate this. As shown in Figure 5a, the IRN is proportional to (1 − α) 0.5 . This relationship comes from the fact that the proportionalities in the power spectral density (PSD) of the IRN to (1 − α) 2 and fH, i.e., Δf to (1 − α) −1 . Figure 5b depicts the effect of B for fH and IRN. The IRN quickly increases due to the proportionality of the PSD of the IRN to B −2 , according to (8) as B decreases. On the other hand, the IRN is also increased by the higher fH as B increases. Since the fH should be set not only by considering the IRN but also by the general specification for neural recordings, α = 0.7 and B = 0.1 were selected. Based on the parameters in Figure 5, we designed an AC-coupled neural recording amplifier within 75 μm 2 of area. Figure 6 shows the IRN and fH versus Cin with the selected topology. It shows the similar optimal point in Figure 6, while showing the better noise performance of 4.47 μVrms.

Improvement in Pixel Structure
A typical pixel circuit block in time-division multiplexed multichannel recording architectures consists of a low noise amplifier, buffer, programmable gain amplifier (PGA), and a low-pass filter (LPF), as shown in Figure 7a [33][34][35]. In the proposed 16:1 TDM architecture, two major improvements for smaller area and power consumptions were made, as shown in Figure 7b. First, as opposed to the conventional pixel structure where the additional LPF after the LNA realizes fast roll-off in the high-frequency region to reduce noise aliasing in the TDM,~−40 dB roll-off is implemented in the LNA. A well-designed OTA can be regarded as a first order system and its phase margins is~90 • when configured in a feedback network because the second pole (P 2 ) is usually located at a higher frequency than that of the dominant pole (P 1 ), as indicated with the black lines in Figure 8a. However, in the neural recording, LNA, which has a high closed loop gain (~40 dB), in other words, a small feedback factor (β) and operates in a pure continuous time domain, such a high phase margin is unnecessary,~60 • of phase margin would be enough [30,36]. Therefore, the early introduction of the second pole (P 2 ' indicated in red), which reduces some phase margin, does not negatively affect the loop stability. Instead, it provides faster roll-off and, thereby, no additional LFP is required in the pixel structure. The top of Figure 8b shows the simulated Bode diagram (gain and phase) of the proposed OTA. As shown, the phase margin when the open-loop gain crosses a 40 dB (=1/β) line is >80 • , indicating the closed loop operation is stable. The bottom of Figure 8b shows the simulated close loop gain of the proposed LNA. Thanks to the early introduction of the second pole, it has faster high-frequency roll-off without the aid of an additional LPF. as shown in Figure 7b. First, as opposed to the conventional pixel structure where the additional LPF after the LNA realizes fast roll-off in the high-frequency region to reduce noise aliasing in the TDM, ~−40 dB roll-off is implemented in the LNA. A well-designed OTA can be regarded as a first order system and its phase margins is ~90° when configured in a feedback network because the second pole (P2) is usually located at a higher frequency than that of the dominant pole (P1), as indicated with the black lines in Figure  8a. However, in the neural recording, LNA, which has a high closed loop gain (~40 dB), in other words, a small feedback factor (β) and operates in a pure continuous time domain, such a high phase margin is unnecessary, ~60° of phase margin would be enough [30,36]. Therefore, the early introduction of the second pole (P2' indicated in red), which reduces some phase margin, does not negatively affect the loop stability. Instead, it provides faster roll-off and, thereby, no additional LFP is required in the pixel structure. The top of Figure  8b shows the simulated Bode diagram (gain and phase) of the proposed OTA. As shown, the phase margin when the open-loop gain crosses a 40 dB (=1/β) line is >80°, indicating the closed loop operation is stable. The bottom of Figure 8b shows the simulated close loop gain of the proposed LNA. Thanks to the early introduction of the second pole, it has faster high-frequency roll-off without the aid of an additional LPF. In addition, the in-channel PGA has been moved after the analog multiplexer to avoid the slew-limit operation in the TDM buffer [23]. Thanks to such a modification, the buffer can be designed considering only the linear settling. This configuration also helps the neural recording frontend save power and area. As depicted in Figure 7a, in the conventional design, the output of the in-channel PGA is almost rail-to-rail to maximize signal DR. Thus, the following TDM buffer must be designed by considering the slew-limiting operation, which results in high power consumption. On the other hand, by placing the PGA after the multiplexer, the input range of the TDM buffer becomes smaller, estimated as ~ 200-300 mVpp at best (considering the maximum LPF amplitude of ~3 mV); in other words, the buffer can be designed by only considering linear settling.  In addition, the in-channel PGA has been moved after the analog multiplexer to avoid the slew-limit operation in the TDM buffer [23]. Thanks to such a modification, the buffer can be designed considering only the linear settling. This configuration also helps the neural recording frontend save power and area. As depicted in Figure 7a, in the conventional design, the output of the in-channel PGA is almost rail-to-rail to maximize signal DR. Thus, the following TDM buffer must be designed by considering the slewlimiting operation, which results in high power consumption. On the other hand, by placing the PGA after the multiplexer, the input range of the TDM buffer becomes smaller, estimated as~200-300 mV pp at best (considering the maximum LPF amplitude of~3 mV); in other words, the buffer can be designed by only considering linear settling.

Analog-to-Digital Converter and Serial Peripheral Interface
The design of the SAR ADC used in this work has been adopted from [22,23], thus, let us briefly explain the technique in this paper here. The fully differential 10b SAR ADC is implemented with a smaller area by reducing the size of capacitor DAC (CDAC) by half compared to the conventional top plate sampling CDACs. With a conventional top plate sampling, a 4-bit resolution can be achieved with a total 16C (=4C + 2C + 1C + 1C and differential, where C: unit capacitor), which has a 2× smaller capacitor with the aid of the bootstrap switches when compared to a 4-bit bottom plate sampling CDAC, as shown in Figure 9a. In this implementation, an additional 2× smaller CDAC has been achieved by the introduction of the dummy capacitor switching technique as shown in Figure 9b, resulting in a 5-bit resolution with the same capacitor size as that in Figure 9a. Thus, for a 10-b differential DAC, 512C is sufficient. In addition, in this implementation the input signal is sampled via a simple transmission gate (TX gate) via a switch that shorts the top plates without the bootstrap switches. For energy-efficient operation, we adopted a VCMbased switching technique [37], therefore, four switches are required for a unit capacitor as shown in the right of Figure 9b. The overall sampling capacitor is ~4.48 pF for fully differential operations. All switches and active components are buried under the MIM capacitors to further save area consumption. By applying the VCM-based switching and terminating capacitor switching scheme, we implemented the fully differential 10b SAR ADC in a small area of 75 × 350 μm 2 and power consumption of ~6.02 μW at its full speed of 500 kS/s, equivalently ~0.377 μW per channel.

Analog-to-Digital Converter and Serial Peripheral Interface
The design of the SAR ADC used in this work has been adopted from [22,23], thus, let us briefly explain the technique in this paper here. The fully differential 10b SAR ADC is implemented with a smaller area by reducing the size of capacitor DAC (CDAC) by half compared to the conventional top plate sampling CDACs. With a conventional top plate sampling, a 4-bit resolution can be achieved with a total 16C (=4C + 2C + 1C + 1C and differential, where C: unit capacitor), which has a 2× smaller capacitor with the aid of the bootstrap switches when compared to a 4-bit bottom plate sampling CDAC, as shown in Figure 9a. In this implementation, an additional 2× smaller CDAC has been achieved by the introduction of the dummy capacitor switching technique as shown in Figure 9b, resulting in a 5-bit resolution with the same capacitor size as that in Figure 9a. Thus, for a 10-b differential DAC, 512C is sufficient. In addition, in this implementation the input signal is sampled via a simple transmission gate (TX gate) via a switch that shorts the top plates without the bootstrap switches. For energy-efficient operation, we adopted a V CM -based switching technique [37], therefore, four switches are required for a unit capacitor as shown in the right of Figure 9b. The overall sampling capacitor is~4.48 pF for fully differential operations. All switches and active components are buried under the MIM capacitors to further save area consumption. By applying the V CM -based switching and terminating capacitor switching scheme, we implemented the fully differential 10b SAR ADC in a small area of 75 × 350 µm 2 and power consumption of~6.02 µW at its full speed of 500 kS/s, equivalently~0.377 µW per channel. Figure 10 shows the timing diagram of the implemented SPI slave. One SPI cycle is 500 kS/s and includes 32 SCLKs. To relax the driving requirement of the SAR ADC and TDM buffers, 0.875 and 2 µs are assigned for the settling time. The data from the SAR ADC is once latched and then serialized onto MISO (master-in, slave-out) signal. The  Figure 10 shows the timing diagram of the implemented SPI slave. One SPI cycle is 500 kS/s and includes 32 SCLKs. To relax the driving requirement of the SAR ADC and TDM buffers, 0.875 and 2 μs are assigned for the settling time. The data from the SAR ADC is once latched and then serialized onto MISO (master-in, slave-out) signal. The SPI slave is implemented with the standard cells provided by the chosen 180 nm CMOS technology and auto placement and routing (APR) in an area of 75 × 450 μm 2 .

Experimental Results
The prototype 16-channel AC-coupled neural recording analog frontend chip was fabricated using a 180 nm 1P6M CMOS process. Figure 11 shows a microphotograph of the fabricated prototype chip. One of the pixels is enlarged and attached as a subset in Figure 11, clearly indicating the pad opening for the later flip-chip bonding with a neural probe using ACF. Thus, except for a few pads for 0.8 V analog and 1.0 V digital supplies, ground, and SPI controls, the pads are only for the benchtop characterization. The active area of the chip occupies 2.56 × 0.075 mm 2 . The important functional blocks, such as the pixels, PGA, ADC, SPI, and bias block, are highlighted. ADC processing for Ch [2] ADC samples data this point MUX Ch. (a) (b) Figure 9. Conceptual diagram for successive approximation register (SAR) ADCs: (a) A conventional 4-bit top plate switching SAR; (b) Implemented 5-bit SAR ADC using the dummy capacitor switching technique. Figure 10 shows the timing diagram of the implemented SPI slave. One SPI cycle is 500 kS/s and includes 32 SCLKs. To relax the driving requirement of the SAR ADC and TDM buffers, 0.875 and 2 μs are assigned for the settling time. The data from the SAR ADC is once latched and then serialized onto MISO (master-in, slave-out) signal. The SPI slave is implemented with the standard cells provided by the chosen 180 nm CMOS technology and auto placement and routing (APR) in an area of 75 × 450 μm 2 .

Experimental Results
The prototype 16-channel AC-coupled neural recording analog frontend chip was fabricated using a 180 nm 1P6M CMOS process. Figure 11 shows a microphotograph of the fabricated prototype chip. One of the pixels is enlarged and attached as a subset in Figure 11, clearly indicating the pad opening for the later flip-chip bonding with a neural probe using ACF. Thus, except for a few pads for 0.8 V analog and 1.0 V digital supplies, ground, and SPI controls, the pads are only for the benchtop characterization. The active area of the chip occupies 2.56 × 0.075 mm 2 . The important functional blocks, such as the pixels, PGA, ADC, SPI, and bias block, are highlighted. ADC processing for Ch [2] ADC samples data this point MUX Ch.

Experimental Results
The prototype 16-channel AC-coupled neural recording analog frontend chip was fabricated using a 180 nm 1P6M CMOS process. Figure 11 shows a microphotograph of the fabricated prototype chip. One of the pixels is enlarged and attached as a subset in Figure 11, clearly indicating the pad opening for the later flip-chip bonding with a neural probe using ACF. Thus, except for a few pads for 0.8 V analog and 1.0 V digital supplies, ground, and SPI controls, the pads are only for the benchtop characterization. The active area of the chip occupies 2.56 × 0.075 mm 2 . The important functional blocks, such as the pixels, PGA, ADC, SPI, and bias block, are highlighted.  Figure 12a shows the frequency response of the fabricated LNA measured with a spectrum analyzer (35670A, Keysight). The mid-band gain was measured as 40 dB and its low (fL) and high (fH) frequency corners were 0.1 Hz and 7.4 kHz, respectively. The −20 dB/dec slope was observed near fH and the roll-off soon became −40 dB/dec thanks to the  Figure 12a shows the frequency response of the fabricated LNA measured with a spectrum analyzer (35670A, Keysight). The mid-band gain was measured as 40 dB and its low (f L ) and high (f H ) frequency corners were 0.1 Hz and 7.4 kHz, respectively. The −20 dB/dec slope was observed near f H and the roll-off soon became −40 dB/dec thanks to the early introduced pole. The total harmonic distortion (THD) was also measured by varying the input amplitudes of a 1 kHz sinewave as shown in Figure 12b. This shows <1% THD when the input signal was less than 8.6 mV pp . The relatively large input amplitude with <1% THD can be achieved thanks to the reference-replica technique that forms a differential-difference amplifier structure.  Figure 12a shows the frequency response of the fabricated LNA measured with a spectrum analyzer (35670A, Keysight). The mid-band gain was measured as 40 dB and its low (fL) and high (fH) frequency corners were 0.1 Hz and 7.4 kHz, respectively. The −20 dB/dec slope was observed near fH and the roll-off soon became −40 dB/dec thanks to the early introduced pole. The total harmonic distortion (THD) was also measured by varying the input amplitudes of a 1 kHz sinewave as shown in Figure 12b. This shows <1% THD when the input signal was less than 8.6 mVpp. The relatively large input amplitude with <1% THD can be achieved thanks to the reference-replica technique that forms a differential-difference amplifier structure.   Figure 13a depicts the spectrum of the IRN from 0.4 Hz to 50 kHz. The noise floor and the 1/f corner were ~35 nV/√Hz and ~1 kHz, respectively. The total integrated IRN was calculated as 4.27 μVrms from 0.4 Hz to 50 kHz, which shows a good agreement with the transient noise measurement as shown in Figure 13b. In the transient noise measurement, 5 seconds of data was collected with a sampling rate of 12.5 MS/s. The ADC was also characterized in the benchtop. Figure 14a shows the signal-to-noise and distortion ratio (SNDR) and signal to spurious-free dynamic range (SFDR) according to the range of the input frequency from 10 Hz to the near Nyquist rate (249 kHz). Both the SNDR and SFDR begin to be degraded for >50 kHz input frequencies but still show decent performance. Figure 14b demonstrates Figure 13b. In the transient noise measurement, 5 seconds of data was collected with a sampling rate of 12.5 MS/s. The ADC was also characterized in the benchtop. Figure 14a shows the signal-to-noise and distortion ratio (SNDR) and signal to spurious-free dynamic range (SFDR) according to the range of the input frequency from 10 Hz to the near Nyquist rate (249 kHz). Both the SNDR and SFDR begin to be degraded for >50 kHz input frequencies but still show decent performance. Figure 14b demonstrates one example of the fast Fourier transform (FFT) when the input frequency is 5 kHz. The calculated effective number of bit (ENOB) is 8.9 bit at 1 kHz input sinewave. The measured differential nonlinearity (DNL) and integral nonlinearity (INL) of the 10-b SAR ADC are distributed at +0.15/−0.35 and +1.75/−2.15, respectively. Figure 15a shows a gain distribution of an entire analog frontend channel including LNA and PGA with a 9 dB gain setting. The mean value of the channel gain is~278.63 v/v (nominal value: 40 dB + 9 dB ≈ 281.84 V/V) with a standard deviation of~3.65 v/v among 12 different samples. The variations for the IRN, f L , and f H are also measured as~4.2 ± 0.08 µVrms, 0.097 ± 0.0038 Hz, and~7.4 ± 0.06 kHz, respectively. The relatively high variation in f L (~8%) may come from the pseudo-resistor formed by the near-off transistors, but such variation may be fine for the neural recordings since it is low enough to embrace the LFPs.  Figure 15a shows a gain distribution of an entire analog frontend channel including LNA and PGA with a 9 dB gain setting. The mean value of the channel gain is ~278.63 v/v (nominal value: 40 dB + 9 dB ≈ 281.84 V/V) with a standard deviation of ~3.65 v/v among 12 different samples. The variations for the IRN, fL, and fH are also measured as ~4.2 ± 0.08 μVrms, ~0.097 ± 0.0038 Hz, and ~7.4 ± 0.06 kHz, respectively. The relatively high variation in fL (~8%) may come from the pseudo-resistor formed by the near-off transistors, but such variation may be fine for the neural recordings since it is low enough to embrace the LFPs. Figure  15b shows the percentile of the power consumption of each block. The power consumptions per channel are ~3.44 and ~0.58 μW from a 0.8 V analog and 1.0 V digital supplies. The LNA that determines overall noise performance consumes the largest portion of power of ~2 μW.   Figure 15a shows a gain distribution of an entire analog frontend channel including LNA and PGA with a 9 dB gain setting. The mean value of the channel gain is ~278.63 v/v (nominal value: 40 dB + 9 dB ≈ 281.84 V/V) with a standard deviation of ~3.65 v/v among 12 different samples. The variations for the IRN, fL, and fH are also measured as ~4.2 ± 0.08 μVrms, ~0.097 ± 0.0038 Hz, and ~7.4 ± 0.06 kHz, respectively. The relatively high variation in fL (~8%) may come from the pseudo-resistor formed by the near-off transistors, but such variation may be fine for the neural recordings since it is low enough to embrace the LFPs. Figure  15b shows the percentile of the power consumption of each block. The power consumptions per channel are ~3.44 and ~0.58 μW from a 0.8 V analog and 1.0 V digital supplies. The LNA that determines overall noise performance consumes the largest portion of power of ~2 μW.

In Vitro Characterization and Performance Comparison
To further characterize the functionality of the fabricated 16-channel AC-coupled neural recording frontend, we performed measurements in phosphate buffered saline (PBS) by applying a set of pre-recorded neural data including LFP and EAP signals. The measurement was performed with the gain of ~280 v/v (40 dB from the LNA, 9 dB from the PGA) and the circuit was battery-powered. The data were collected in real-time by a custom software (LabView, National Instrument). The top and bottom of Figure 16 shows

In Vitro Characterization and Performance Comparison
To further characterize the functionality of the fabricated 16-channel AC-coupled neural recording frontend, we performed measurements in phosphate buffered saline (PBS) by applying a set of pre-recorded neural data including LFP and EAP signals. The measurement was performed with the gain of~280 v/v (40 dB from the LNA, 9 dB from the PGA) and the circuit was battery-powered. The data were collected in real-time by a custom software (LabView, National Instrument). The top and bottom of Figure 16 shows the input referred neural signals including LFP and EAP (spike) and the software (MATLAB, MathWorks) bandpass filtered (f L = 250 Hz and f H = 7.5 kHz) spikes, respectively.

In Vitro Characterization and Performance Comparison
To further characterize the functionality of the fabricated 16-channel AC-coupled neural recording frontend, we performed measurements in phosphate buffered saline (PBS) by applying a set of pre-recorded neural data including LFP and EAP signals. The measurement was performed with the gain of ~280 v/v (40 dB from the LNA, 9 dB from the PGA) and the circuit was battery-powered. The data were collected in real-time by a custom software (LabView, National Instrument). The top and bottom of Figure 16 shows the input referred neural signals including LFP and EAP (spike) and the software (MATLAB, MathWorks) bandpass filtered (fL = 250 Hz and fH = 7.5 kHz) spikes, respectively. The performance of this work is compared with recent state-of-the-art works in Table  1. This work shows a decent channel FoM and E-A FoM of 147.87 fJ/c-s and 1.77 fJ/c-s•mm 2 , respectively, while some essential performance for neural recordings, such as the bandwidth and IRN, are comparable or better than other recent works. Particularly, this work The performance of this work is compared with recent state-of-the-art works in Table 1. This work shows a decent channel FoM and E-A FoM of 147.87 fJ/c-s and 1.77 fJ/c-s·mm 2 , respectively, while some essential performance for neural recordings, such as the bandwidth and IRN, are comparable or better than other recent works. Particularly, this work shows the smallest footprint of 0.012 mm 2 per channel except for that in [11]. While [11] considers only the active circuits into their estimation, this work takes the active circuits and LNA input pads that will be coupled to a neural probe into consideration, thus, this work could use the area more effectively than [11]. This is of particular importance in multichannel neural recording systems with a high count because the overall chip area is ultimately limited by the interconnection with neural probes, not active circuits when the channel count becomes large [23].

Conclusions
In this paper, we present an energy-and area-efficient AC-coupled 16-channel analog frontend for multichannel recording of broadband neural signals. To achieve such a small area-and energy-product, we devised an improved area-aware design, especially useful for very area-constrained neural recording LNA designs, and employed an OTA with a local positive feedback with the differentiated design parameter selection, while optimizing the design parameters affecting performance, power, and area trade-offs. In addition to that, a second pole was strategically introduced inside the LNA to reduce the noise bandwidth without an in-channel low-pass filter to further save the area and energy consumption. Therefore, compared to our previous work and other conventional works, the presented prototype shows better performance in terms of noise, power, and area usages. The prototype fabricated in 180 nm 1P6M CMOS process consumes 3.44 µW and 0.012 mm 2 per channel while achieving the IRN of 4.27 µVrms with a channel and E-A FoM of 147.87 fJ/c−s and 1.77 fJ/c−s·mm 2 , respectively.