Winner-Take-All and Loser-Take-All Circuits: Architectures, Applications and Analytical Comparison

: Different winner-take-all (WTA) and loser-take-all (LTA) circuits are studied, and their operations are analyzed in this review. The exclusive operation of the current conveyor, binary tree, and time-domain WTA/LTA architectures, as the most important architectures reported in the literature, are compared from the perspectives of power consumption, speed, and precision.


Introduction
WTA/LTA circuits are used to determine the maximum or minimum out of multiple inputs .These units are among the fundamental blocks for realizing neural networks, data classification/clustering approaches, and image processing algorithms in complementary metal oxide semiconductor (CMOS) technology.Unsupervised learning networks are also implemented using WTA/LTA circuits [15], whose applications span from generative adversarial networks to ladder networks and variational autoencoders [45].Fuzzy logic control [2,27], rectifiers [12,32,37], artificial neural networks (ANN) [3,36], associative memory [8], neuromorphic [44], vision sensors [40,46], nonlinear filters [31] and telecommunication circuits [5] are among the other applications which contain WTA/LTA units.Both sampled input and continuous-time WTAs/LTAs are present.As for the continuoustime WTAs/LTAs, their speed is defined as the maximum frequency to which the circuit can maintain a designated precision/accuracy.Setting aside the differences between the definitions in the literature, the resolution of a WTA/LTA specifies the minimum detectable input, while accuracy refers to the allowable error over the maximum input range through which the circuit can function correctly.These parameters are in connection with precision, which specifies how a circuit can reliably reproduce an identical resolution over a prescribed range.
WTA/LTA circuits can be classified in different perspectives, depending on the type of input (voltage or current), output (the winning signal or its index), and circuit architecture.Regardless of the classifications, there are three commonly used approaches used to integrate these circuits, as graphically shown in Figure 1.As depicted in Figure 1a, the solutions in the first category rely on a parallel operation that employs N identical units fed by inputs.The output will be equal to the winning signal despite the address not being specified.Coined as current conveyors (CC), the efficiency of these circuits depends on the number of inputs since complexity in this group is an exponential function of N, despite the simple approach being used to integrate CC units [10,26]."Corner error" occurs when two or more inputs show similar values, causing the output of the current conveyors to converge to an average value rather than the winning input [12,28,39].This error, when combined with fundamental drawbacks such as high voltage supply requirement and reduced bandwidth, restricts the applications of the classical CC WTAs in modern systems [28].Some recent solutions aim to overcome the challenges related to the voltage supply and frequency restrictions [7,[22][23][24], as will be discussed later in this review.
ronments are the main advantages of this category.The digital nature of the circuits allows more compatibility with nano-scale CMOS technology, which allows less consuming power.However, the nonlinearity caused by converting the input to a time-domain signal is the issue that can reduce the accuracy.
With the above background in mind, we shall review and analyze the different categories of WTA/LTA circuits in Section 2. A comprehensive comparison of different structures will be provided in Section 3, and conclusions will be drawn in Section 4.  The second group of WTA/LTA circuits deals with the concept of binary tree (BT) operation.Figure 1b illustrates the configuration of a binary-tree (BT) WTA/LTA module.Signals in the paired form are applied to the input cells, and only one signal out of each pair is considered the winner that can take part in the competition of the next layer.The BT solutions not only extract the winner, but they can find the address of the winning signal contrary to the CC circuits [26].Not only is the resolution degraded by the number of inputs, but BT circuits are also plagued by the relatively high propagation delay, excessive complexity, and even more power and silicon area.In the shadow of these limitations, the BT solution is adopted when precision is prevailing.Amplifying the inputs prior to comparison enables the unit to reach higher resolutions in a shorter decision time, irrespective of the architecture.
The third group in this review involves the time-domain WTAs (TDWTAs), which can convert the input current/voltage to delayed pulses according to the systematic implementation in Figure 1c.A phase detector (PD) or a time comparator is used to specify the first delayed pulse reached the PD.Higher performance metrics in low-voltage environments are the main advantages of this category.The digital nature of the circuits allows more compatibility with nano-scale CMOS technology, which allows less consuming power.However, the nonlinearity caused by converting the input to a time-domain signal is the issue that can reduce the accuracy.
With the above background in mind, we shall review and analyze the different categories of WTA/LTA circuits in Section 2. A comprehensive comparison of different structures will be provided in Section 3, and conclusions will be drawn in Section 4.

Literature Review
This section is divided based on the classification of WTA/LTA circuits addressed in Section 1.The solutions described are arranged following a time order.

Current Conveyors
The current conveyor (CC) WTA circuit depicted in Figure 2 was originally proposed by Lazzaro et al. in [1].Coined as Lazzaro's circuit, the circuit is composed of N input cells in which the operation of all MOSFETs is in weak inversion.Every cell consists of a voltage follower (M i2 ) and a common-source transistor (M i1 ) in the form of a negative feedback loop.The voltage V i at the input of M i2 increases when the current I i is greater than the rest.This enlarges the common voltage Vc and reduces the gate-source voltage (V GS ) of all voltage followers except M i2 , switching off the corresponding devices as a result.The voltage Vc will eventually be proportional to the highest input, and the output current Io can be generated through Vc coupled to the gate of the output transistor (Mo).A problem of Lazzaro's circuit is the presence of the interconnection parasitics, which slows down the operation.Another shortcoming comes from the reduced precision when increasing the number of identical cells for a lowered mismatch.Matching also trades with the device sizes and, consequently, the silicon area.

Literature Review
This section is divided based on the classification of WTA/LTA circuits addr Section 1.The solutions described are arranged following a time order.

Current Conveyors
The current conveyor (CC) WTA circuit depicted in Figure 2 was originally pr by Lazzaro et al. in [1].Coined as Lazzaro's circuit, the circuit is composed of N inp in which the operation of all MOSFETs is in weak inversion.Every cell consists of a follower (Mi2) and a common-source transistor (Mi1) in the form of a negative fe loop.The voltage Vi at the input of Mi2 increases when the current Ii is greater than This enlarges the common voltage Vc and reduces the gate-source voltage (VGS) of age followers except Mi2, switching off the corresponding devices as a result.The Vc will eventually be proportional to the highest input, and the output current Io generated through Vc coupled to the gate of the output transistor (Mo).A problem zaro's circuit is the presence of the interconnection parasitics, which slows down eration.Another shortcoming comes from the reduced precision when increas number of identical cells for a lowered mismatch.Matching also trades with the sizes and, consequently, the silicon area.Several advanced implementations of WTA/LTA circuits are present in CMO nology.A high-precision approach is introduced in [3] for improving the accuracy zaro's circuit, aiming at processing more than 1024 inputs of a real-world scientif dustrial application.The circuit is capable of specifying the index of the winning s well as its value in the voltage domain.To describe its operation, it is worth not analyzing the inputs of this circuit is carried out by two layers.The voltages app common voltage are converted to currents within the first layer, and the currents a to generate a proportional voltage.The largest input significantly reduces other c by raising the common voltage.The first layer is followed by the second layer, ai saturating the winning signal up to positive supply rail by enhancing the over factor.In [9], each cell employs an auxiliary transistor cascaded with a sink current as illustrated in Figure 3, improving the resolution of Lazzaro's circuit by enlarg gain factor.By comparison, the voltage range is reduced in the presence of the tr cascaded.Several advanced implementations of WTA/LTA circuits are present in CMOS technology.A high-precision approach is introduced in [3] for improving the accuracy of Lazzaro's circuit, aiming at processing more than 1024 inputs of a real-world scientific or industrial application.The circuit is capable of specifying the index of the winning signal as well as its value in the voltage domain.To describe its operation, it is worth noting that analyzing the inputs of this circuit is carried out by two layers.The voltages applied to a common voltage are converted to currents within the first layer, and the currents are used to generate a proportional voltage.The largest input significantly reduces other currents by raising the common voltage.The first layer is followed by the second layer, aiming at saturating the winning signal up to positive supply rail by enhancing the overall gain factor.In [9], each cell employs an auxiliary transistor cascaded with a sink current source, as illustrated in Figure 3, improving the resolution of Lazzaro's circuit by enlarging the gain factor.By comparison, the voltage range is reduced in the presence of the transistor cascaded.In ref.
[13], the input currents are copied to NMOS and PMOS mirrors, as shown in Figure 4.The summation of the mirrored NMOS currents then flows into each cell to be compared with the inputs.The result of each comparison controls the output currents of the cells.This alters the total current flow into the current comparators, and the procedure continues until one current greater than the total one remains.The high-speed, high-precision WTA circuit reported in [14] incorporates an N-input current maximum selector in its input layer.The input stage produces N-current outputs, which are mirrored into a feedback circuit that produces the feedback current.The feedback current is used to correct the corner error of the maximum circuit.The output stage is formed by N high-speed current comparators that provide a binary output for each input.This way, only the output corresponding to the winning input will show a logical "1".The solution in [20] contains inhibitory and excitatory feedback that prevent the selection of the potential winners.Each cell consists of 12 transistors connected to the common node Vc, according to the illustration in Figure 5.The input current is copied and compared with the average current of all cells.For the largest input, node Vx decreases such that its output exhibits a logical "1".The inhibitory feedback decreases Vc of other cells, increasing Vx such that a logical "0" appears in their output.The excitatory feedback has an opposite impact on the winning signal.Node Vx of the winning cell is consequently reduced by increasing the input current.Since the input current is compared with the average of all inputs, the inhibitory and excitatory feedback will provide a hysteretic mechanism that prevents the selection of a potential winner unless it is stronger than the selection [20].With a wide input current range, the above-described mechanism is well-suited for high-speed, high-precision applications.In ref.
[13], the input currents are copied to NMOS and PMOS mirrors, as shown in Figure 4.The summation of the mirrored NMOS currents then flows into each cell to be compared with the inputs.The result of each comparison controls the output currents of the cells.This alters the total current flow into the current comparators, and the procedure continues until one current greater than the total one remains.The high-speed, highprecision WTA circuit reported in [14] incorporates an N-input current maximum selector in its input layer.The input stage produces N-current outputs, which are mirrored into a feedback circuit that produces the feedback current.The feedback current is used to correct the corner error of the maximum circuit.The output stage is formed by N high-speed current comparators that provide a binary output for each input.This way, only the output corresponding to the winning input will show a logical "1".The solution in [20] contains inhibitory and excitatory feedback that prevent the selection of the potential winners.Each cell consists of 12 transistors connected to the common node Vc, according to the illustration in Figure 5.The input current is copied and compared with the average current of all cells.For the largest input, node V x decreases such that its output exhibits a logical "1".The inhibitory feedback decreases V c of other cells, increasing V x such that a logical "0" appears in their output.The excitatory feedback has an opposite impact on the winning signal.Node V x of the winning cell is consequently reduced by increasing the input current.Since the input current is compared with the average of all inputs, the inhibitory and excitatory feedback will provide a hysteretic mechanism that prevents the selection of a potential winner unless it is stronger than the selection [20].With a wide input current range, the above-described mechanism is well-suited for high-speed, high-precision applications.In ref.
[13], the input currents are copied to NMOS and PMOS mirrors, as shown in Figure 4.The summation of the mirrored NMOS currents then flows into each cell to be compared with the inputs.The result of each comparison controls the output currents of the cells.This alters the total current flow into the current comparators, and the procedure continues until one current greater than the total one remains.The high-speed, high-precision WTA circuit reported in [14] incorporates an N-input current maximum selector in its input layer.The input stage produces N-current outputs, which are mirrored into a feedback circuit that produces the feedback current.The feedback current is used to correct the corner error of the maximum circuit.The output stage is formed by N high-speed current comparators that provide a binary output for each input.This way, only the output corresponding to the winning input will show a logical "1".The solution in [20] contains inhibitory and excitatory feedback that prevent the selection of the potential winners.Each cell consists of 12 transistors connected to the common node Vc, according to the illustration in Figure 5.The input current is copied and compared with the average current of all cells.For the largest input, node Vx decreases such that its output exhibits a logical "1".The inhibitory feedback decreases Vc of other cells, increasing Vx such that a logical "0" appears in their output.The excitatory feedback has an opposite impact on the winning signal.Node Vx of the winning cell is consequently reduced by increasing the input current.Since the input current is compared with the average of all inputs, the inhibitory and excitatory feedback will provide a hysteretic mechanism that prevents the selection of a potential winner unless it is stronger than the selection [20].With a wide input current range, the above-described mechanism is well-suited for high-speed, high-precision applications.The current-mode circuit developed in [21] is based on Lazzaro's WTA circuit, seeking to increase the accuracy in low-voltage environments.As depicted in Figure 6, each input voltage follower in the original circuit is replaced by a flipped voltage follower or FVF.An FVF is essentially a voltage follower (MAi), which includes a negative shunt feedback (via MCi), enabling the sink of large currents by keeping constant the voltage of the current sensing device.All the FVF cells are coupled to a low-impedance common Vc.The implementation is essentially a maximum current selector since its output current Io follows the maximum betweenI1andIn.Its main advantage is the modest VGS+2Vovsupply voltage requirement, in which Vov is the transistors' overdrive voltage.Proposed in [25], the current-mode LTA solution in Figure 7 includes MoA as a voltage-controlled current source, with node U common for all MiA devices.Within each cell, MiC converts the input current Ii into a proportional drain voltage.The source-to-gate voltages of MiB compete at node U, and the maximum voltage corresponding to the smallest input current is considered the winner [25].The architecture is simple, low-power, and modular.The current-mode circuit developed in [21] is based on Lazzaro's WTA circuit, seeking to increase the accuracy in low-voltage environments.As depicted in Figure 6, each input voltage follower in the original circuit is replaced by a flipped voltage follower or FVF.An FVF is essentially a voltage follower (M Ai ), which includes a negative shunt feedback (via M Ci ), enabling the sink of large currents by keeping constant the voltage of the current sensing device.All the FVF cells are coupled to a low-impedance common Vc.The implementation is essentially a maximum current selector since its output current Io follows the maximum between I 1 and In.Its main advantage is the modest V GS + 2Vov supply voltage requirement, in which Vov is the transistors' overdrive voltage.Proposed in [25], the current-mode LTA solution in Figure 7 includes M age-controlled current source, with node U common for all MiA devices.With MiC converts the input current Ii into a proportional drain voltage.The source-t ages of MiB compete at node U, and the maximum voltage corresponding to t input current is considered the winner [25].The architecture is simple, lowmodular.Proposed in [25], the current-mode LTA solution in Figure 7 includes M oA as a voltagecontrolled current source, with node U common for all M iA devices.Within each cell, M iC converts the input current I i into a proportional drain voltage.The source-to-gate voltages of M iB compete at node U, and the maximum voltage corresponding to the smallest input current is considered the winner [25].The architecture is simple, low-power, and modular.In ref. [27], a voltage-mode WTA is developed with excitatory and inhibitory feed backs based on the original WTA core in [3]. Figure 8 shows the scheme of the WTA cells including which M8 and M9 constitute the excitatory and inhibitory circuits, respectively The additional feedback enhances the resolution without introducing any extra stage.Another voltage-mode simple architecture for detecting the maximum and minimum inputs is reported in [32].Its tiny size with a minimum number of transistors makes i ideal for high-frequency applications.The circuit combines the differential amplification with shunt feedback in the voltage buffer, in which the output voltage follows the winning input, although its address is not specified.The solution proposed in [34] utilizes a com mon-gate transistor to enhance its open-loop gain factor.High accuracy levels can thus be reached in low-voltage environments.Figure 9 exhibits the current-mode LTA circuit pro posed in [38].The role of the triode Mwi in each cell is to establish an effective feedback mechanism.The minimum input current generates the largest voltage at Ci, and the rel evant Mui sinks current from Ib so as to copy the lowest input current to the output.High speed operation can be reached with high accuracy levels at the expense of more powe consumption and area.The minimum voltage supply is also increased because of cascade current mirrors.In ref. [27], a voltage-mode WTA is developed with excitatory and inhibitory feedbacks based on the original WTA core in [3]. Figure 8 shows the scheme of the WTA cells, including which M 8 and M 9 constitute the excitatory and inhibitory circuits, respectively.The additional feedback enhances the resolution without introducing any extra stage.In ref. [27], a voltage-mode WTA is developed with excitatory and inhibitory feed backs based on the original WTA core in [3]. Figure 8 shows the scheme of the WTA cells including which M8 and M9 constitute the excitatory and inhibitory circuits, respectively The additional feedback enhances the resolution without introducing any extra stage.Another voltage-mode simple architecture for detecting the maximum and minimum inputs is reported in [32].Its tiny size with a minimum number of transistors makes i ideal for high-frequency applications.The circuit combines the differential amplification with shunt feedback in the voltage buffer, in which the output voltage follows the winning input, although its address is not specified.The solution proposed in [34] utilizes a com mon-gate transistor to enhance its open-loop gain factor.High accuracy levels can thus be reached in low-voltage environments.Figure 9 exhibits the current-mode LTA circuit pro posed in [38].The role of the triode Mwi in each cell is to establish an effective feedback mechanism.The minimum input current generates the largest voltage at Ci, and the rel evant Mui sinks current from Ib so as to copy the lowest input current to the output.High speed operation can be reached with high accuracy levels at the expense of more powe consumption and area.The minimum voltage supply is also increased because of cascade current mirrors.Another voltage-mode simple architecture for detecting the maximum and minimum inputs is reported in [32].Its tiny size with a minimum number of transistors makes it ideal for high-frequency applications.The circuit combines the differential amplification with shunt feedback in the voltage buffer, in which the output voltage follows the winning input, although its address is not specified.The solution proposed in [34] utilizes a common-gate transistor to enhance its open-loop gain factor.High accuracy levels can thus be reached in low-voltage environments.Figure 9 exhibits the current-mode LTA circuit proposed in [38].The role of the triode M wi in each cell is to establish an effective feedback mechanism.The minimum input current generates the largest voltage at Ci, and the relevant M ui sinks current from Ib so as to copy the lowest input current to the output.High-speed operation can be reached with high accuracy levels at the expense of more power consumption and area.The minimum voltage supply is also increased because of cascade current mirrors.Figure 10 presents another derivation of Lazzaro's circuit with speed and accuracy advantages [37].The output impedance at node Vc is decreased by the additional feedback loops through Mi3.The circuit shows superior performance with respect to the original Lazzaro's circuit.However, power consumption and area are increased because of more branches.

Binary Tree WTA Circuits
The input signals of the BT topologies are coupled in pairs, and one signal out of each pair is only considered a local winner.The winner takes part in the competition of the next layer until the global winner is specified.Increasing the number of inputs does not affect the accuracy of BT topologies.Nevertheless, the area, power, and delay are increased.During the 90′s decade, binary-tree WTAs were used widely in applications such as nonlinear filters, analog-to-digital converters, vector quantizers, and fuzzy circuits [2,4,6,10].The voltage-mode binary-tree WTA in Figure 11 is presented In [17].The initial comparison is fulfilled between two random inputs.The greater input voltage is directed to the output, and a digital output is preserved for its address.The output of the first stage is then applied to the second stage for comparison.This procedure continues iteratively until the largest input is determined with its address.
Another current-mode WTA presented in [23] can operate at low supply voltages down to 0.5 V.The circuit is composed of a transresistance comparator and a few current

Binary Tree WTA Circuits
The input signals of the BT topologies are coupled in pairs, and one signal out of pair is only considered a local winner.The winner takes part in the competition of the layer until the global winner is specified.Increasing the number of inputs does not a the accuracy of BT topologies.Nevertheless, the area, power, and delay are increa During the 90′s decade, binary-tree WTAs were used widely in applications such as linear filters, analog-to-digital converters, vector quantizers, and fuzzy circuits [2,4,6 The voltage-mode binary-tree WTA in Figure 11 is presented In [17].The initial com son is fulfilled between two random inputs.The greater input voltage is directed to output, and a digital output is preserved for its address.The output of the first sta then applied to the second stage for comparison.This procedure continues iteratively til the largest input is determined with its address.
Another current-mode WTA presented in [23] can operate at low supply volt down to 0.5 V.The circuit is composed of a transresistance comparator and a few cur mirrors and is utilized for learning Kohonen's network.Figure 12 illustrates anothe

Binary Tree WTA Circuits
The input signals of the BT topologies are coupled in pairs, and one signal out of each pair is only considered a local winner.The winner takes part in the competition of the next layer until the global winner is specified.Increasing the number of inputs does not affect the accuracy of BT topologies.Nevertheless, the area, power, and delay are increased.During the 90 s decade, binary-tree WTAs were used widely in applications such as nonlinear filters, analog-to-digital converters, vector quantizers, and fuzzy circuits [2,4,6,10].The voltage-mode binary-tree WTA in Figure 11 is presented In [17].The initial comparison is fulfilled between two random inputs.The greater input voltage is directed to the output, and a digital output is preserved for its address.The output of the first stage is then applied to the second stage for comparison.This procedure continues iteratively until the largest input is determined with its address.
tion.The main idea is to stimulate MIMA2 such that it sends information regarding the winning signal through the LOGIC block back to the INPUT block.In response, the input block passes another copy of the winning signal to the next layer.This architecture benefits from less propagation delay.
Other binary-tree WTA/LTA topologies have been introduced for spiking neural networks (SNN) or neuromorphic applications [42,44], which suffer from excessive delay and larger are as and will not be described here for the sake of brevity.Another current-mode WTA presented in [23] can operate at low supply voltages down to 0.5 V.The circuit is composed of a transresistance comparator and a few current mirrors and is utilized for learning Kohonen's network.Figure 12 illustrates another binarytree WTA developed in [26].It consists of the front-end current-to-time converters prior to the time comparators.The input currents are converted to time delays (a delayed pulse in which the delay is proportional to the input), and the time comparators compare the input delayed pulses.The larger inputs are then determined and directed to the next layer, enabling us to finally determine the largest input current.The main advantages of this circuit are its low-power and low-voltage operation.However, speed is a challenge for the described topology.
In ref.
[29], a translinear loop is utilized to amplify the difference between the two inputs prior to comparison.A positive feedback loop is also used to improve the compar ison accuracy.The operation of its transistors in the sub-threshold region not only reduce its consuming power but also enhances the precision as compared to the early solutions However, similar to other BT circuits, speed is a challenge.Another current-mode binary tree WTA circuit is presented in [28], where a modified current comparator and mirroring scheme are exploited to improve both latency and accuracy.As shown in Figure 13a, block denoted by MIMA2 (Figure13b) is used to compare the input currents in this solu tion.The main idea is to stimulate MIMA2 such that it sends information regarding th winning signal through the LOGIC block back to the INPUT block.In response, the inpu block passes another copy of the winning signal to the next layer.This architecture bene fits from less propagation delay.
Other binary-tree WTA/LTA topologies have been introduced for spiking neural net works (SNN) or neuromorphic applications [42,44], which suffer from excessive delay and larger are as and will not be described here for the sake of brevity.In ref.
[29], a translinear loop is utilized to amplify the difference between the two inputs prior to comparison.A positive feedback loop is also used to improve the comparison accuracy.The operation of its transistors in the sub-threshold region not only reduces its consuming power but also enhances the precision as compared to the early solutions.However, similar to other BT circuits, speed is a challenge.Another current-mode binarytree WTA circuit is presented in [28], where a modified current comparator and mirroring scheme are exploited to improve both latency and accuracy.As shown in Figure 13a, a block denoted by MIMA2 (Figure 13b) is used to compare the input currents in this solution.The main idea is to stimulate MIMA2 such that it sends information regarding the winning signal through the LOGIC block back to the INPUT block.In response, the input block passes another copy of the winning signal to the next layer.This architecture benefits from less propagation delay.

Time-Domain WTA/LTA Circuits
Time-domain solutions are becoming more popular due to their compatibility with low-voltage CMOS technology.A number of time-domain WTA configurations have been reported in the literature [12,33,48,49].The recent time-domain configurations are becoming comparable with the class of CC and BT solutions in terms of speed, power, and resolution.The first time-domain WTA circuit to be discussed in this section is based on the self-resetting integrate-and-fire neurons [19].Each neuron functions as a WTA cell, according to Figure 14.The internal capacitor (Csoma) is charged by the input current of the cell.The larger the current, the faster the capacitor charging will be.The first neuron, which reaches the threshold switching voltage of the inner inverter, pulls up the output and generates an output spike.The first spike thus resets other cells and causes zero outputs until the next sampling time.Large capacitors are needed for this circuit to reach higher resolutions, which affects its speed.Other binary-tree WTA/LTA topologies have been introduced for spiking neural networks (SNN) or neuromorphic applications [42,44], which suffer from excessive delay and larger are as and will not be described here for the sake of brevity.

Time-Domain WTA/LTA Circuits
Time-domain solutions are becoming more popular due to their compatibility with low-voltage CMOS technology.A number of time-domain WTA configurations have been reported in the literature [12,33,48,49].The recent time-domain configurations are becoming comparable with the class of CC and BT solutions in terms of speed, power, and resolution.The first time-domain WTA circuit to be discussed in this section is based on the self-resetting integrate-and-fire neurons [19].Each neuron functions as a WTA cell, according to Figure 14.The internal capacitor (Csoma) is charged by the input current of the cell.The larger the current, the faster the capacitor charging will be.The first neuron, which reaches the threshold switching voltage of the inner inverter, pulls up the output and generates an output spike.The first spike thus resets other cells and causes zero outputs until the next sampling time.Large capacitors are needed for this circuit to reach higher resolutions, which affects its speed.The combination reported in [43] is meant for the learning engine of the neural networks based on a parallel activity.In each cell, a linear delay element is used for converting the input voltage to a delayed pulse.A sensing amplifier is then utilized to detect the winning pulse corresponding to a larger input voltage.The time-domain WTA circuit illustrated in Figure 16a is presented in [47].Here, the input signals control a reference clock A similar approach was applied in [35] for imaging.The capacitor is precharged in every pixel of the image sensor, as shown in Figure 15.The pixel capacitor is then discharged by a current source that depends on the intensity of the incident light.Two inverters are used to detect the timing at which the capacitor voltage reaches the threshold voltage VDD/2.As such, the input signals of the D-type flip flops (DFFs) and NAND gates (V1, V2, ..., VN) would be the digital pulses with different delays.As soon as the output of the pixel of the winning current becomes Low, the output of the NAND gate pulls down, and all DFFs are clocked at the falling edge of Vx.The DFF output corresponding to the winner thus changes to High while the rest remain Low.Using an open-loop structure for comparing the input-dependence delay times lowers the resolution of this circuit.For instance, when two input currents are close, the phase detector will not be able to detect the first pulse.Non-unique winners may also occur when more than one output becomes High.A similar approach was applied in [35] for imaging.The capacitor is precharged in every pixel of the image sensor, as shown in Figure 15.The pixel capacitor is then discharged by a current source that depends on the intensity of the incident light.Two inverters are used to detect the timing at which the capacitor voltage reaches the threshold voltage VDD/2.As such, the input signals of the D-type flip flops (DFFs) and NAND gates (V1, V2, ...,VN) would be the digital pulses with different delays.As soon as the output of the pixel of the winning current becomes Low, the output of the NAND gate pulls down, and all DFFs are clocked at the falling edge of Vx.The DFF output corresponding to the winner thus changes to High while the rest remain Low.Using an open-loop structure for comparing the input-dependence delay times lowers the resolution of this circuit.For instance, when two input currents are close, the phase detector will not be able to detect the first pulse.Non-unique winners may also occur when more than one output becomes High.The combination reported in [43] is meant for the learning engine of the neural networks based on a parallel activity.In each cell, a linear delay element is used for converting the input voltage to a delayed pulse.A sensing amplifier is then utilized to detect the winning pulse corresponding to a larger input voltage.The time-domain WTA circuit illustrated in Figure 16a is presented in [47].Here, the input signals control a reference clock The combination reported in [43] is meant for the learning engine of the neural networks based on a parallel activity.In each cell, a linear delay element is used for converting the input voltage to a delayed pulse.A sensing amplifier is then utilized to detect the winning pulse corresponding to a larger input voltage.The time-domain WTA circuit illustrated in Figure 16a is presented in [47].Here, the input signals control a reference clock pulse within the voltage-controlled delay lines (VCDL).The implementation of VCDL blocks is depicted in Figure 16b.The delays corresponding to the inputs are proportional to the number of VCDL stages (N).Hence, it is possible to customize the value of N based on the required resolution.Conceptually, the delays corresponding to the inputs are amplified by VCDLs, and the positive-feedback phase detector detects the first pulse and deactivates other outputs.Amplification of time through VCDLs also enhances the resolution.Other advantages of this circuit are high-speed, low-power, and low-voltage operations.Despite these advantages, it suffers from the limit of input common-mode voltage.Specifically, at least one input voltage must be greater than anNMOS threshold voltage, which is critical for low-voltage operation.The area is also increased for high-resolution applications.
Chips 2023,2, FOR PEER REVIEW 11 pulse within the voltage-controlled delay lines (VCDL).The implementation of VCDL blocks is depicted in Figure 16b.The delays corresponding to the inputs are proportional to the number of VCDL stages (N).Hence, it is possible to customize the value of N based on the required resolution.Conceptually, the delays corresponding to the inputs are amplified by VCDLs, and the positive-feedback phase detector detects the first pulse and deactivates other outputs.Amplification of time through VCDLs also enhances the resolution.Other advantages of this circuit are high-speed, low-power, and low-voltage operations.Despite these advantages, it suffers from the limit of input common-mode voltage.Specifically, at least one input voltage must be greater than anNMOS threshold voltage, which is critical for low-voltage operation.The area is also increased for high-resolution applications. (

Comparison and Discussions
The performance of WTAs/LTAs can be compared from various perspectives.Resolution, power, area, speed, complexity, supply voltage range, compatibility with CMOS technology, and the number of inputs should be accounted for in a fair comparison.Most of the circuit improvements of WTAs/LTAs were reviewed in the previous section.At first, a general comparison will be made between the three WTA/LTA classes.It should be noted that the forthcoming results are based on the data reported in the original publications and not on a new design phase.Figure 17 compares the speed and power of the CC, BT, or TD configurations.CC topologies not only achieve higher speeds but also can lower power consumption.By comparison, BT architectures can reach better accuracy levels at the price of inferior speed and more power consumption caused by more internal layers.Very little data are available about the time-domain WTAs.Nonetheless, low power and

Comparison and Discussions
The performance of WTAs/LTAs can be compared from various perspectives.Resolution, power, area, speed, complexity, supply voltage range, compatibility with CMOS technology, and the number of inputs should be accounted for in a fair comparison.Most of the circuit improvements of WTAs/LTAs were reviewed in the previous section.At first, a general comparison will be made between the three WTA/LTA classes.It should be noted that the forthcoming results are based on the data reported in the original publications and not on a new design phase.Figure 17 compares the speed and power of the CC, BT, or TD configurations.CC topologies not only achieve higher speeds but also can lower power consumption.By comparison, BT architectures can reach better accuracy levels at the price of inferior speed and more power consumption caused by more internal layers.Very little data are available about the time-domain WTAs.Nonetheless, low power and medium speed can be expected from these architectures.From the accuracy point of view, BT topologies have a significant preference.The ability to process many inputs also increased the demand for the corresponding implementation in recent years.Overall, both CC and BT circuits found their particular applications, depending on the advantages such as speed, area, power or accuracy/precision, number of inputs, and reliability of one category over the other.The main advantage of TD design is its flexibility for different applications.Not only can these architectures be part of a low-power and low-voltage design, but their technology compatibility and digital nature make them ideal for medium-frequency and high-resolution applications.
Chips 2023,2, FOR PEER REVIEW 12 medium speed can be expected from these architectures.From the accuracy point of view, BT topologies have a significant preference.The ability to process many inputs also increased the demand for the corresponding implementation in recent years.Overall, both CC and BT circuits found their particular applications, depending on the advantages such as speed, area, power or accuracy/precision, number of inputs, and reliability of one category over the other.The main advantage of TD design is its flexibility for different applications.Not only can these architectures be part of a low-power and low-voltage design, but their technology compatibility and digital nature make them ideal for medium-frequency and high-resolution applications.
where N and f refer to the number of inputs and maximum operating frequency, respectively.The viewpoint of low-voltage operation, the circuits presented in [28,43,47,48] are more promising, while the implementation presented in [43] has a relatively higher operating frequency.On the other hand, the TD circuit reported in [47] exhibits a lower voltage operation and, thus, a superior FoM.
In terms of speed, the circuits in [20,43] show better metrics.The configuration in [43] is capable of operating with a large number of inputs.As a result, it exhibits better FoM, whereas the circuit in [20] is compact and more accurate.In terms of accuracy, excellent results have been reported in [20,28,33,38,47].The binary-tree structure is superior since it only compares two inputs simultaneously.This advantage is more prominent when the number of inputs is increased at the cost of more power consumption and lower speed.Current conveyor and time-domain WTA circuits can obtain different accuracies depending on their implementation, but the power consumption of the time-domain structures is superior, besides no stability issues.
As it is evident, the precision of a binary tree WTA is independent of its number of inputs.Analytically, we can hence claim that the precision of the BT topologies surpasses other implementations, especially for an increased number of inputs.Nonetheless, the calculation of the precision is mostly ignored and not carried out in the literature.To sum where N and f refer to the number of inputs and maximum operating frequency, respectively.The viewpoint of low-voltage operation, the circuits presented in [28,43,47,48] are more promising, while the implementation presented in [43] has a relatively higher operating frequency.On the other hand, the TD circuit reported in [47] exhibits a lower voltage operation and, thus, a superior FoM.
In terms of speed, the circuits in [20,43] show better metrics.The configuration in [43] is capable of operating with a large number of inputs.As a result, it exhibits better FoM, whereas the circuit in [20] is compact and more accurate.In terms of accuracy, excellent results have been reported in [20,28,33,38,47].The binary-tree structure is superior since it only compares two inputs simultaneously.This advantage is more prominent when the number of inputs is increased at the cost of more power consumption and lower speed.Current conveyor and time-domain WTA circuits can obtain different accuracies depending on their implementation, but the power consumption of the time-domain structures is superior, besides no stability issues.
As it is evident, the precision of a binary tree WTA is independent of its number of inputs.Analytically, we can hence claim that the precision of the BT topologies surpasses other implementations, especially for an increased number of inputs.Nonetheless, the calculation of the precision is mostly ignored and not carried out in the literature.To sum up, this claim is analytically reasonable, although little data are available to prove it statistically.From the perspective of power consumption, [28,47,49] can reach the lowest power per cell, but [28,47] show better resolution and FoM.Regarding the area occupied, CC-based topologies occupy the least area as compared with BT circuits.Exceptionally, the TD WTA circuit reported in [47] shows a comparable area.Table 1 presents a comprehensive comparison between the main WTAs presented in the prior art.The highest performance metrics belong to [28,43,47].Regardless of the architecture, both the technology node and supply voltage strongly affect the operation of WTA/LTA circuits.In our comparison table, there exist a number of old structures with outdated technologies (0.5-2.4µm).Presenting the early studies in this review was only to investigate the trend of WTA/LTA design.However, similar to any other fields, the primitive WTA/LTA configurations suffer from more complexity, poor efficiency, and high consumption of power and silicon footprint.Most of the early solutions cannot even be realized under the reduced supply voltage of nano-scale technologies.From a technology point of view, the main issues are speed, power, and supply voltage.Circuit design in new technologies benefits from high speed and less silicon area.However, there are some challenges, such as leakage current, more cost, and more complexity.To choose the appropriate technology, if high speed is not required, using older process nodes with supply voltage lower than nominal is a good choice.This can reduce both power consumption and manufacturing costs simultaneously.However, the area will increase.In those high-speed circuits in which the power consumption is not the issue, choosing the new technology nodes is thus suggested.However, it should be kept in mind that older architectures cannot be implemented at low supply voltage in the presence of more stacked transistors.Finally, since speed and precision are traded, it will be difficult to choose a technology for a high-precision design.However, despite the technology compatibility of the time-domain WTAs, this solution is more suited for precise implementation.Figure 18 summarizes the FoM vs. supply voltage of those circuits reported in Table 1.From these results, it can be concluded that the operating voltage can be related to the technology scaling.Also, technology scaling does not improve the performance of CC architectures.This was expected since most of these circuits are analog.Another point from Figure 18 is that the performance of BT circuits is improved almost linearly with scaling.This is because of the digital nature of these structures.[10,16,20,22,27,28,30,36,39,43,47].[10,16,20,22,27,28,30,36,39,43,47].
Generally, there are three types of applications for the WTA/LTA circuits.The first type is that set of applications that call for high speed and high resolution with a smaller number of inputs.The second type is those implementations that need precision/accuracy despite the large number of inputs.The third application requires very compact and highspeed circuits with medium resolution and a large number of inputs.Figure 19 gives a full statistical view of the circuits presented in recent years.The average speed of the WTAs has been increasing in the past decades.In contrast, the consumption power and FoM show a significant reduction.This is mainly by virtue of technological improvement and more demand for low-power and high-frequency applications.[10,16,20,22,27,28,30,36,39,43,47].

Conclusions
In this review, we presented an overview of the present WTA/LTA solutions to help improvise the proper solutions for future designs.At first, we briefly reviewed the research works published on different designs and their applications over the past decades.Classifications of the present WTA/LTA architectures were presented later.The main advantages and disadvantages of each CC, BT, and TD topologies were also described.Specifically, power consumption, speed, resolution, area, number of inputs, and low-voltage operation were studied and compared.

Figure 1 .
Figure 1.The commonly used approaches used for integrating WTA/LTA circuits: (a) current conveyor, (b) binary tree, and (c) time domain.

Figure 1 .
Figure 1.The commonly used approaches used for integrating WTA/LTA circuits: (a) current conveyor, (b) binary tree, and (c) time domain.

Figure 5 .
Figure 5. Cell 1 and k (out of n) of the WTA topology discussed in [20].

Figure 5 .
Figure 5. Cell 1 and k (out of n) of the WTA topology discussed in [20].

Figure 5 .
Figure 5. Cell 1 and k (out of n) of the WTA topology discussed in [20].

Figure 10 Figure 9 .
Figure10presents another derivation of Lazzaro's circuit with speed and accuracy advantages[37].The output impedance at node Vc is decreased by the additional feedback loops through Mi3.The circuit shows superior performance with respect to the original Lazzaro's circuit.However, power consumption and area are increased because of more branches.

Figure 10
Figure10presents another derivation of Lazzaro's circuit with speed and accu advantages[37].The output impedance at node Vc is decreased by the additional feed loops through Mi3.The circuit shows superior performance with respect to the orig Lazzaro's circuit.However, power consumption and area are increased because of m branches.

Figure 14 .
Figure 14.The neuro-WTA cell shown together with the current source and inverter common for all cells presented in [19].

Figure 14 .
Figure 14.The neuro-WTA cell shown together with the current source and inverter common for all cells presented in [19].

Chips 2023, 2 ,Figure 14 .
Figure 14.The neuro-WTA cell shown together with the current source and inverter common for all cells presented in [19].

Figure 16 .
Figure 16.Scheme of the time-domain WTA presented in [47]: (a) system-level implementation and (b) transistor-level implementation of VCDL block.

Figure 16 .
Figure 16.Scheme of the time-domain WTA presented in [47]: (a) system-level implementation and (b) transistor-level implementation of VCDL block.

Figure 19 .
Figure 19.Average speed, power, and FoM of WTA circuits vs. year.

Figure 19 .
Figure 19.Average speed, power, and FoM of WTA circuits vs. year.

Figure 19 .
Figure 19.Average speed, power, and FoM of WTA circuits vs. year.

Table 1 .
Performance comparison of different WTA/LTA circuits.

Table 1 .
Performance comparison of different WTA/LTA circuits.

Table 1 .
Performance comparison of different WTA/LTA circuits.