1. Introduction
Nowadays, digital programmable circuits such as FPGAs are used widely for many industrial applications [
1]. They offer possibilities for integrating complex hardware systems and their interfaces into a single FPGA chip. FPGA devices have reached a high maturity level in terms of performance, power consumption, and cost, which makes them suitable for different application fields [
2] that involve sensor systems [
3,
4], control systems [
5,
6], haptic interfaces [
7,
8], robotics [
9,
10,
11], and other advanced electronic devices for signal processing and communications, e.g., [
12,
13,
14,
15]. They can also be useful for advanced velocity estimation methods, since they can easily be designed for custom signal conditioning and provide fast dedicated digital logic [
16]. FPGAs can also ensure highly accurate and short sampling periods, which are required for advanced motion control applications. A short processing time of control algorithms is also possible; however, the feature-rich FPGAs have overwhelming resources that are required for performing complex computation algorithms [
17,
18] yet are still expensive, which may prevent their further use in low-cost applications. One of the challenges in the electronics industry nowadays concerns fitting complex circuitry into small spaces. Thus, the current FPGA development is also directed to smaller design solutions, with low-power consumption in a low-occupied space at low cost, while maintaining a solid performance level [
19]. Such low-power, small-size, and cost-optimized FPGAs introduce the possibility of developing new cutting-edge solutions, e.g., in the area of smart sensors and other smart devices [
20].
The importance of velocity feedback for the closed-loop control of mechatronic systems has been well-known for many decades. Typical feedback devices in the control systems for servo drives in automation are incremental encoders [
21]. These control loops must allow smooth motion in a wide range from high speed to low speed. Incremental encoders, which are typically mounted on a motor shaft, can provide high resolution and high accuracy in position measurement, which makes them an optimal option for such a purpose [
22]. On the other hand, the maturity level of digital electronics technology has enabled their use not only for position feedback, but also for velocity feedback [
23]. They generate digital pulse train signals while their shaft rotates, which is coupled with a motor shaft. However, precise feedback information is essential for motor control, e.g., in machine tools, robotics, autonomous systems, and medical mechatronics. Regardless of the motor control methods, the accuracy of feedback information has a major influence on the control performance. Furthermore, advanced motion control applications require highly accurate wide-range velocity information with high bandwidth [
24,
25]. Thus, since incremental encoders can only generate a pulse train with its frequency related to the shaft velocity, a suitable estimation algorithm should be applied in order to obtain smooth velocity information. Although there has been extensive past research related to this issue over the past few decades [
26,
27], new emerging applications in robotics and other areas with cutting-edge requirements foster further intensive research in this field, e.g., the motion control of hydraulic actuators [
28], pneumatic actuators [
29], haptic interfaces [
30], mobile robots [
31], robot manipulators [
32], humanoid robots [
33], and collaborative robots for safe physical human–robot interaction [
34], automotive applications [
35], etc. Since the performance of a velocity control loop is related strongly to the quality of the velocity feedback information, achieving a smooth, high-bandwidth velocity estimate, which is required for oscillation-free high-bandwidth control loops, still presents quite a challenge.
The motor velocity is conventionally obtained by counting encoder pulses in a fixed sampling period. However, despite its simple implementation, it cannot provide precise velocity information in the case of a low-resolution encoder or short sampling periods. Furthermore, in these cases, it produces large quantization errors due to the spatial position quantization that is inherent in incremental encoders. Therefore, more advanced estimation methods should be applied in order to obtain smooth velocity information. They can be classified as model-based or non model-based methods [
36]. Model-based methods use models of the dynamic systems for which the velocities are to be estimated. In general, these methods rely on a mechanical system model, which must be accurately available. This is often a significant obstacle in practice, since it may be difficult to obtain, or it is time varying, or it may be unknown or uncertain. In these cases, non model-based methods can be adopted. These methods are based on data processing techniques that use only measured data without any of the information of the motion system. The aforementioned simplest conventional method is usually referred to as the M method. Due to its problems at low speed, filtering of the data may be applied, although it introduces phase lag, which is undesired in advanced motion control applications. The alternative T method can be utilized instead, in which the time interval between two adjacent encoder pulses should be measured by counting high-frequency clock pulses. Then, the velocity information is obtained by the reciprocal of the measured time interval, i.e., by arithmetic division. Although the method can provide fine velocity estimation at low speed, it is prone to errors at high speed. Thus, the main stream of the velocity measurement methods combines the M method and the T method. Since the pioneering work of Ohmae [
37], the MT method has been applied widely, because it works well within wide speed ranges. The method counts the number of encoder pulses in a sampling period and measures the time interval between the boundary encoder pulses by counting the high-frequency clock pulses. Then, the velocity information is computed by arithmetical division of the number of encoder pulses over the time interval. Some variations of the MT method appeared in [
38,
39,
40]. Further research deals with performance improvement of the MT method and its enhanced robustness to the encoder imperfections and hardware inaccuracies [
16,
25,
41,
42]. The time stamping of encoder pulses presents a generalization of the MT method [
43,
44]. Here, the velocity is estimated by a polynomial fitting through a number of time-stamped encoder counts, which significantly increases the computation complexity. 
The MT method consists of signal reading and data processing. Common implementations may conveniently combine an FPGA for the hardware part of the method, which involves signal acquisition, counting encoder pulses, and counting clock pulses, and a computation engine such as a DSP processor for processing the software algorithm of the method in real time [
16,
26,
45,
46]. Such a heterogeneous system structure significantly increases the system complexity. Thus, it may be convenient to implement the velocity estimation method on an FPGA as a whole. However, one should note that the algorithm of the MT method involves arithmetic division. Although it is easy to perform the division operation by the DSP, on the other hand, it is particularly difficult for FPGA-based embedded electronics, i.e., its implementation is characterized with high computation latency and a high consumption of FPGA resources [
47,
48,
49]. Although modern advanced and feature-rich FPGAs with overwhelming resources are capable of performing such tasks, this solution raises the cost drastically, which may be unacceptable in many cases. Therefore, for the efficient implementation of the MT method on an FPGA, it is necessary to eliminate the division operation that is inherent in the conventional algorithm. Then, its implementation will be possible on cost-optimized FPGAs as well. Such FPGAs, on the other hand, can embed digital multipliers, or so-called DSP blocks, that perform arithmetic multiplication and addition efficiently [
50]. Therefore, division-less algorithms can be implemented easily in an efficient way with high-speed processing. Zhu [
51] proposed a simple solution that seamlessly combines the M method and the T method. It applies a sophisticated mechanism with two counters and an accumulator; however, it requires the execution of the algorithm continuously in every clock cycle, which increases the resources and power consumption, and thus is undesired [
49]. 
In our research, we are concerned with the efficient implementation of the MT method for smooth velocity estimation on a low-cost FPGA, as illustrated by 
Figure 1. We introduced a novel MT-type method in [
52] that eliminates the arithmetic division from the algorithm, while preserving the utmost accuracy of the conventional MT method. Thus, it is convenient for implementation on small-sized cost-optimized FPGAs. The method is abbreviated as a DLMT (division-less MT). It involves multiplication and addition, which are the only required arithmetic operations in the real-time calculation. However, the method is recursive; thus, stability and convergence should be considered strictly in the design phase. It has been found out that the basic method algorithm can suffer from instability at low speed in the case of blank sampling intervals when encoder pulses are widespread. Therefore, it has been adapted properly to guarantee asymptotically stable performance at all speed ranges [
53]. Such a generalized DLMT method has been abbreviated as the GDLMT method. The method was implemented on the FPGA and verified experimentally. Although the method performed well in practical experiments, it is represented as a time-varying discrete system of the second order that may generate a weakly damped fluctuated response. Thus, in this paper, we revise the algorithm to reduce the system order, and propose the division-less algorithm with the simplified dynamics of the first order. We show that, inherently, it significantly reduces undesired fluctuations during the transient phase. The stability and convergence is also discussed theoretically and proven formally. Furthermore, the estimation error is reduced as well, which is confirmed by experimental results.
  3. Results
The proposed division-less MT-type methods for velocity estimation may show instability in practical implementation when blank sampling intervals appear at low speed. The problem is discussed well in [
53], where the basic DLMT2 algorithm has been properly adapted in order to guarantee stability in a wide-speed range operation. Thus, such a generalized version of the algorithm is abbreviated as GDLMT2. Similarly, the basic DLMT1 algorithm must be generalized for stable practical operation as well. In this paper, we adopt the same approach as that introduced in [
53]. Then, the adapted version is denoted as GDLMT1.
Based on the experience in [
53], we firstly performed experiments such that we decoupled the testing IE with the flywheel from the motor shaft mechanically. Thus, we could remove the mechanical noise from the IE rotation as much as possible. Namely, the induced mechanical noise may blur the tested method’s performance. However, in the case of a manually driven IE (i.e., driving the encoder flywheel by hand), we could not control the velocity profile, and the achieved maximum speed was quite limited. Thus, we continued performing experiments with the IE coupled on the motor shaft. In this case, the IE was involved in the control loop, providing velocity feedback for the motor control. These experiments validated the proposed velocity estimation algorithm for the motion control applications.
We performed numerous experiments with the manually driven IE (so-called manual experiments) and motor driven IE (so-called motorized experiments). We recorded the raw sampled data from the IE and the computed velocity in real time on the FPGA board (GDLMT1 method) and the DSP controller (MT method). The recorded raw sampled data enabled us to calculate the velocity on the desktop computer in offline post-processing mode using the MATLAB software package. Thus, we could compare the performance of the proposed algorithm with other algorithms. We also involved the filtered M method with the Butterworth filter of the second order, which is a very common method for velocity estimation in less demanding motion control applications. However, we assumed that the original MT method can be considered as the reference method; thus, we compared the division-less algorithms with that one. The diagrams below showing experimental results typically involve traces, which denote: (i) the sampled encoder pulses in a single sampling period (so-called encoder velocity) that is equivalent to the basic M method (light green solid line trace), (ii) the velocity computed by the MT method on the DSP controller in real time (dark red solid line trace with squares), (iii) the velocity computed by the fixed-point GDLMT1 algorithm on the FPGA in real time (pink solid line trace with diamonds), (iv) the velocity computed offline by the floating-point GDLMT1 algorithm on the desktop computer (red solid line trace with stars), (v) the velocity computed offline by the floating-point GDLMT2 algorithm on the desktop computer (light red solid line trace with circles), (vi)–(vii) the velocity obtained by Butterworth filtering with the cut-off frequencies of 100 Hz (dark green solid line trace with dots) and 50 Hz (green dash dot line trace), respectively. Additionally, in the manual experiments, we show the so-called “actual” velocity, whereas in the motorized experiments, we show the reference velocity (blue dashed line trace in both cases). The “actual” velocity was obtained offline by a non-causal discrete zero-phase low-pass Butterworth filter with a cut-off frequency of 50 Hz, whereas the reference velocity was computed on the DSP controller in real time. The velocity is given in units of “position pulses per sampling period” (pp/Ts).
  3.1. The Manual Experiments
Figure 11 shows the results from manual experiment number one. In this experiment, we pushed the encoder flywheel by hand strongly, and then left it to stop gradually by itself. Thus, we achieved smooth motion with the velocity peak close to 5 pp/T
s. The upper plots on the diagrams in 
Figure 11a–c show: (i) the encoder velocity, (ii) the MT velocity computed on the DSP controller in real time, and (iii) the velocity obtained by one of the division-less MT-type algorithms, i.e., (a) the fixed-point GDLMT1 algorithm on the FPGA, (b) the floating-point GDLMT1 algorithm on the offline computer, or (c) the floating-point GDLMT2 algorithm on the offline computer. The bottom plots on the diagrams 
Figure 11a–c show the corresponding velocity error with reference to the MT velocity. The velocity plots demonstrate smooth traces of the MT-type methods that match almost perfectly. The error plots further confirm that they match well, since the velocity error stays very low, i.e., below 0.02 pp/T
s. However, it is evident that the error peaks in the GDLMT2 method exceed that limit. The error increased during the high acceleration phase at the start; however, it practically disappeared after the velocity peak. The plots on 
Figure 11a,b also show clearly that the error traces of both the GDLMT1 implementations are almost identical. This verifies the fixed-point implementation on the FPGA. If we compare the error plot on 
Figure 11c with the corresponding error plot on 
Figure 11a or 
Figure 11b, then we can observe a slightly larger error in the case of the GDLMT2 method in the whole time interval. The next plots on 
Figure 11d,f show some interesting details from the velocity plot shown in diagram 
Figure 11a. (i) Detail 1 focuses on the velocity peak, (ii) Detail 2 focuses on the crossing to the velocity below 1 pp/T
s, when blank sampling intervals start to appear, and (iii) Detail 3 focuses on the final stop phase. The details show the encoder velocity, the MT velocity, and the GDLMT1 velocity, respectively, although we added some more velocity traces for the comparison: the “actual” velocity and the velocities obtained by Butterworth filtering with 100 Hz (abbreviated as BW100) and 50 Hz (abbreviated as BW50), respectively. The plot in 
Figure 11d shows smooth velocity traces in all cases, the MT-type velocities match well with the “actual” velocity, while the BW100 and BW50 traces show a significant phase lag as expected, respectively. The plot in 
Figure 11e shows very similar performance for the MT-type velocity traces; however, the BW100 velocity trace fluctuates synchronously with the alternation of the encoder velocity. A relatively large fluctuation of the BW100 velocity trace can also be observed on plot 
Figure 11f. In this case, the BW50 velocity trace fluctuates as well, although to a lesser degree. This phenomenon is due to the sensitivity of the filter on the individual input pulses generated by the encoder at low speed. These rare encoder pulses, on the other hand, cause the stair-like MT-type velocity traces, which match perfectly. However, a delay with reference to the “actual” velocity is evident as well. This is because the MT method updates only if at least one encoder pulse occurs, and, in this case, it can only calculate an average velocity in the recent interpulse time interval (i.e., the time interval between two adjacent encoder pulses). If the interpulse interval is relatively long, then a significant error with reference to the actual velocity at the current sampling instant can appear in the case of accelerated or decelerated motion. When motion stops, the encoder pulses disappear as well, and the MT method cannot update the velocity. In order to deal with this problem, we simply zeroed the velocity after a predefined time interval elapsed (i.e., 10 ms). In order to provide a more accurate estimation in this case, the approach such as in [
56] can be adopted; however, this is out of the scope of this paper.
 In manual experiment number two, we performed sinusoidal motion at relatively low speed. In this case, the velocity did not exceed 1 pp/T
s. The results are depicted by 
Figure 12, which is organized similarly to 
Figure 11. We can also make very similar observations for the diagrams in 
Figure 12a–c. Some interesting details from the diagram 
Figure 12a are depicted by the diagrams in 
Figure 12d–f, which focus on the saddle point (d), the zero-crossing (e), and the stop with overshoot (f). Diagram 
Figure 12d shows smooth velocity traces for the MT method and the GDLMT1 method, respectively, which both match the “actual” velocity almost perfectly. The BW50 velocity trace is quite smooth as well; however, the BW100 velocity trace shows fluctuation. Such fluctuation is even more expressed on diagrams 
Figure 12e,f. Here, we can observe the stair-like velocity traces of the MT-type methods, which appear due to rare encoder pulses and blank sampling intervals. Matching the GDLMT1 velocity trace with the MT velocity trace is evident, whereas the lag with reference to the “actual” velocity due to the averaging velocity of the recent past interpulse interval can also be observed clearly.
  3.2. The Motorized Experiments
Figure 13 illustrates the experimental results with the trapezoidal velocity profile of relatively high set point speed (10 pp/T
s), which were obtained by the motor-driven configuration. The servo drive system was configured in velocity closed-loop control with the testing IE for velocity feedback. Diagram 
Figure 13a depicts the results with the GDLMT1 algorithm implemented on the FPGA, whereas diagram 
Figure 13b depicts the results with the MT velocity computed on the DSP. The upper plots show the encoder velocity, the reference velocity, and the feedback velocity. Both plots demonstrate good matching of the feedback velocity with the reference velocity. The middle plots show the control error, i.e., the difference between the reference velocity and the feedback velocity. The control errors are almost the same. The bottom plot in diagram 
Figure 13a shows the velocity error of the used feedback velocity, i.e., obtained by the GDLMT1 algorithm on the FPGA, and the plot in diagram 
Figure 13b shows the velocity error of the GDLMT2 velocity computed offline. By comparison of these plots, we can observe a larger error in the case of the GDLMT2 velocity. Diagrams 
Figure 13c,d show details 1 and 2, respectively, from diagram 
Figure 13a. Detail 1 illustrates the overshoot of when the actual velocity converged to the set point speed, whereas detail 2 focuses on the traveling phase. We can record an overshoot of 0.1 pp/T
s, pretty smooth velocity in the transient phase, a small ripple of magnitude of 0.005 pp/T
s in the case of the GDMLT1 velocity due to the mechanical noise, and larger fluctuation of the velocity obtained by the filtering method (at BW50, and even more distinctive at BW100) caused by alternation of the encoder pulses.
 We then repeated the experiment; however, we changed the feedback velocity. Instead of the MT-type velocity, we applied the velocity signal obtained by the Butterworth filter (the filtered M method velocity). The control loop remained tuned as it was in 
Figure 13. The results are depicted by 
Figure 14: diagrams (a)–(c) show the case with the BW100 velocity, whereas diagrams (d)–(f) show the case with the BW50 velocity. More specifically, diagrams 
Figure 14a,d depict the reference velocity profile, along with the encoder velocity and the feedback velocity on the upper plots, the control error on the middle plots, and the difference between the reference velocity and the GDLMT1 velocity on the bottom plots. The “big picture” illustrated by the upper plots hides the details; however, it is easy to observe the permanent noise of the encoder velocity. The control error reveals a relatively high ripple in the case of the BW100 velocity, which is also demonstrated on the GDLMT1 velocity error on the plot below. In the case of the BW50 velocity, one can see more oscillations in the transient phase. Details 1 and 2 from diagram 
Figure 14a are illustrated by the plots 
Figure 14b,c, respectively. The overshoot of ca. 0.14 pp/T
s can be noted in 
Figure 14a, which is 40% more than in the case with the GDLMT1 feedback velocity in 
Figure 13c. Furthermore, a large ripple is evident, not only in the BW100 velocity, but also in the GDLMT1 velocity. If we consider that the GDLMT1 velocity is close to the actual velocity, then we can deduce that this could be the true velocity. Thus, the ripple is caused by the BW100 velocity used in the feedback control. Details 1 and 2 from diagram 
Figure 14d that are shown in the plots 
Figure 14e,f somehow confirm the observations from the “big picture” for the BW50 velocity case, which include: i) relatively large oscillations in the transient phase with an overshoot of ca. 0.20 pp/T
s (100% more than in the case with the GDLMT1 feedback velocity), which relates to lower stability of the control loop (a result of a worse phase margin due to the lower cut-off frequency), and ii) a large ripple during the traveling phase. Thus, although the BW50 velocity was smoother than the BW100 velocity, its application in the feedback deteriorated the system output response further.
  5. Conclusions
In this paper, we redesigned the computation algorithm of velocity estimation of the well-known MT method for Incremental Encoders. In our approach, we replaced the arithmetical division that is inherent to the conventional MT method calculus by simpler arithmetic operations, such as multiplication and addition. This yields a division-less MT-type algorithm. This simplification enables the implementation of the MT-type algorithm on low-cost FPGAs. However, the algorithm is recursive, and described as a time-variant discrete filter. The original proposal has been determined by the second-order system, which, under some conditions, could generate weakly damped responses with undesired fluctuations during the transient phase. We focused on order reduction, and thus modified the original algorithm such that we derived the recursive algorithm of the first order, which represents the main novelty of the paper. The stability and convergence of the proposed algorithm was discussed theoretically and proven properly. The numerical examples are also shown in the paper; they demonstrate significant improvement in the transient phase of the filter step response. The novel algorithm was also validated experimentally. It was implemented on the experimental FPGA board, and tested by the industrial rotary Incremental Encoder. The algorithm implementation occupied only a few FPGA resources, which qualifies it for modern small size cost-optimized FPGAs. Furthermore, the execution time is negligible for motion control applications. The real-time experiments demonstrated smooth and highly accurate velocity estimation. The obtained experimental results were compared with other methods. The comparison showed excellent matching with the conventional MT method, and lower velocity error in comparison with the division-less MT-type algorithm of the second order. Therefore, the performance improvement was confirmed by practical experiments. Further experimental validation focused on the application of the novel algorithm for velocity feedback in a simple closed-loop control system with a DC servo motor. The proposed method provided similar control performance as the original MT method. Furthermore, it clearly outperformed the filtered M method in terms of smoothness, phase lag, and transient response of the closed-loop control. Thus, the proposed division-less MT-type velocity estimation algorithm for low-cost FPGAs was validated for motion control applications.