Jet Features: Hardware-Friendly, Learned Convolutional Kernels for High-Speed Image Classiﬁcation

: This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efﬁcient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be used in machine learning algorithms. Using Jet Features, we make signiﬁcant improvements on our previous work, the Evolution Constructed Features (ECO Features) algorithm. Not only do we gain a 3.7 × speedup in software without loosing any accuracy on the CIFAR-10 and MNIST datasets, but Jet Features also allow us to implement the algorithm in an FPGA using only a fraction of its resources. We hope to apply the beneﬁts of Jet Features to Convolutional Neural Networks in the future.


Introduction
The field of computer vision has come a long way in solving the problem of image classification. Not too long ago, handcrafted convolutional kernels were a staple of all computer vision algorithms. With the advent of Convolutional Neural Networks (CNNs), however, handcrafted features have become the exception rather than the rule, and for good reason. CNNs have taken the field of computer vision to new heights by solving problems that used to be unapproachable or unthinkable. With deep learning, convolutional kernels can be learned from patterns seen in the data rather than pre-constructed by algorithm designers.
While CNNs are the most accurate solution to many computer vision tasks, they require many parameters and many calculations to achieve such accuracy. Efforts to reduce model size and operation complexity include binarization [1] and kernel separation [2]. In this work, we seek speed up image classification on simple tasks by leveraging some of the mathematical properties found in classic handcrafted kernels and applying them in a procedural way with machine learning. While this paper does not explore how these properties can be applied to CNNs using deep leering, we leave it for future work.
In this paper, we present Jet Features, a set of learned convolutional kernels. Convolutions with Jet Features are efficient to compute in both hardware and software. They take advantage of redundant calculations during convolutions and use only the simplest operations. We apply these features to our previous machine learning image classification algorithm, the Evolution Constructed Features (ECO Features) algorithm. We call this new version of the algorithm the Evolution Constructed Jet Features (ECO Jet Features) algorithm. It is accurate on simple image classification tasks, and can be efficiently run on embedded computer devices without the need for GPU acceleration. We specifically use Jet Features to allow the algorithm to be implemented in hardware. related to multiscale local jets [30], which is reviewed in Section 2.2, but we introduce them here in a more informal manner.

Introduction to Jet Features
Jet Features are convolutional kernels that can be separated into a series of very small kernels. In general, separable kernels are kernels that perform the same operation as a series of convolutions with smaller kernels. Figure 1 shows an example of a 3 × 3 convolutional kernel that can be separated into a series of convolutions with a 3 × 1 kernel and a 1 × 3 kernel. Jet Features take separability to an extreme, being separated into the smallest meaningful sized kernels with only 2 elements. Specifically, all Jet Features can be separated into a series of convolutions with kernels from the set { [1,1], [1,1] T , [1, −1] and [1, −1] T }, which are also shown in Figure 2. We will refer to these small kernels as the Jet Feature building blocks. Two of these kernels, [1,1] and [1,1] T , can be seen as blurring factors or scaling factors. We will refer to them as scaling factors. The other two kernels, [1, −1] and [1, −1] T , apply a difference between pixels in either the x or y direction and can be viewed as simple partial derivative operators. We will refer to them as partial derivative operators. All Jet Features are a series of convolutions with any number of these basic building blocks. With these building blocks, some of the most popular classic filters can be constructed. In Figure 3 we show how the Gaussian and Sobel filters can be broken down into Jet Feature building blocks.  It is important to note that the convolution operation is commutative and the order in which the Jet Feature building blocks are applied does not matter. Therefore, every Jet Feature is defined by the number of each building block it uses. For example, the 3 × 3 x-direction Sobel kernel can be defined as 1 x-direction and 2 y-direction scaling factors and 1 x-direction partial derivative operator (see Figure 3).

Multiscale Local Jets
We can more formally define a jet feature as an image transform that is selected from a multiscale local jet. All features for the algorithm are selected from the same multiscale local jet. Multiscale local jets were proposed by Florack et al. [30] as useful image representations that could capture both the scale and spatial information within an image. They have proven to be useful for various computer vision tasks such as feature matching, feature tracking, image classification and image compression [31][32][33]. Manzanera constructed a single unified system for several of these tasks using multiscale local jets and attributed its effectiveness to the fact that many other features are implcitly contained within a multiscale local jet [33]. Some of these popular features include the Gaussian blur, the Sobel operator and the Laplacian filter.
Multiscale local jets are a set of partial derivatives of a scale space of a function. Members of a multiscale local jet have been previously defined in [30,31,33] as where A is an input image, δ x m y n is a differential operator to the degree of m with respect to x and degree n with respect to y and G σ is the Gaussian operator with a variance of σ. A multisacle local jet is the set of outputs L x m y n σ (A) for a given range of values for m,n and σ:

The ECO Features Algorithm
We developed the original ECO Features algorithm in [3,4]. Its main purpose is to automatically construct good image features that could be used for classification. This eliminates the need for man experts to hand craft features for specific applications. This algorithm was developed as CNNs were gaining popularity, which solved similar problems [34]. We recognize that CNNs are able to achieve better accuracy than the ECO Features algorithm in most tasks, but ECO Features are smaller and generally less computationally expensive. In this paper we are interested in the effectiveness of Jet Features in the ECO Features algorithm. The impact of Jet Features are fairly straightforward to explore when working with the ECO Features algorithm. Exploration of Jet Features in CNNs is left for future work.
An ECO Feature is a series of image transforms performed back to back on an input image. Figure 4 shows an example of a hypothetical ECO Feature. Each transform in the feature can have a number of parameters that change the effects of the transform. The algorithm starts with a predetermined pool of transforms which are selected by the user. Table 1 shows the pool of transforms used in [4].  The genetic algorithm initially forms ECO Features by selecting a random series of transforms and randomly setting each of their parameters. The parameters of each transform are modified through the process of mutation in the genetic algorithm. New ordering of the transforms are also created as pairs of ECO Features are joined together in genetic crossover, where the first part of one series is spliced with the latter portion of a different series. A graphical representation of mutation and crossover are shown in Figure 5. Each ECO Feature is paired with a classifier. An example is given in Figure 6. Originally, single perceptrons were used as the classifiers for each ECO Feature. Since perceptrons are only capable of binary classification, we seek to extend the algorithms capabilities to multiclass classification and we use a random forest [35] in this work. Inputs are fed through the ECO Feature transforms and the outputs are fed into the classifier. A hold set of images is then used to evaluate the accuracy of each ECO Feature. This accuracy is used as a fitness score when performing genetic selection in the genetic algorithm. ECO Features with high fitness scores are propagated to future rounds of evolution while weak ECO Features die off. The genetic algorithm continues until a single ECO Feature outperforms all others for a set number of consecutive generations. This ECO Feature is selected and saved while all others are discarded. The whole process is repeated for every ECO Feature produced by the genetic algorithm. Figure 6. A example paring of an ECO Feature with a random forest classifier. Every ECO Feature is paired with its own classifier. Originally, perceptrons were used, but in our work, random forests are used which offer multiclass classification.
As the genetic algorithm selects ECO Features, they are combined to form an ensemble using a boosting algorithm. We use the SAMME [36] variation of AdaBoost [37] for multiclass classification. The boosting algorithm adjusts the weights of the dataset giving importance to harder examples after each ECO Feature is created. This leads to ECO Features tailored to certain aspects of dataset. Once the desired number of ECO Features have been constructed, they are combined into an ensemble. This ensemble predicts the class of new input images by passing the image through all of the ECO Feature learners, letting each one vote for which class should be predicted. Figure 7 depicts a complete ECO Features system. Since the publications of [3,4] we have applied ECO Features to the problem of visual inspection where ECO Features were used to determine the maturity of date fruits [38]. This algorithm has also been used in industry to automate visual inspection for other processes.

The ECO Jet Features Algorithm
In this section we look at how Jet Features can be introduced into the ECO Features algorithm. We call this modified version the ECO Jet Features algorithm. This modification speeds up performance while maintaining accuracy on simple image classification. It was specifically designed to allow for easy implementation in hardware.

Jet Feature Selection
The ECO Jet Features algorithm uses a similar genetic algorithm to the one discussed in Section 3. Instead of selecting image transforms from a pool and combining them into a series, it simply uses a single Jet Feature. The amount of scaling and partial derivatives are the parameters that are tuned through evolution. These four parameters are bounded from 0 to a set maximum, forming the multiscale local jet, similar to Equation (2). We found that bounding the partial derivatives, δ x , δ y ∈ [0, 2], and scaling factors, σ x , σ y ∈ [0, 6], is effective at finding good features.
In order to accommodate the use of jet features, mutation and cross over are redefined. The four parameters of the jet feature, δ x ,δ y ,σ x and σ y , are treated like genes that make up the genome of the feature. During mutation, the values of these individual parameters are altered. During cross over, the genes of a child jet feature would each be copied from either the father or the mother genome. This selection is made randomly. This is illustrated in Figure 8

Advantages in Software
The jet feature transformation can be calculated with a series of matrix shifts, additions and subtractions. Since the elements of the bases kernels for the transformations are either 1 or −1, there is no need for general convolution with matrix multiplication. Instead, a jet transforms can be applied to image A by making a copy of A, shifting it in either the x or y directions by one pixel and then adding or subtracting it with the original. Padding is not used. Using jet transforms, there is no need for multiplication for division operations. We do recognize that a normalization is normally used with traditional kernels, however, since this normalization is applied to all elements equally of an input image and the output values are fed into a classifier, we argue that the only difference normalization makes is to keep the intermediate values of the image representation reasonably small. In practice, we see no improvement in accuracy by normalizing during the jet feature transform.
Another property of Jet Features that allows for efficient computation is the fact that one Jet Features of a higher order can be calculated using the result of a Jet Feature of a lower order. The outputs of the lowest order jet features can be used as an input to any other ECO Jet Feature that has parameters of equal or greater value. Calculating all of the jet features in an ensemble in the shortest amount of time can be seen as a optimization problem where the order of which features are calculated is optimized to require the minimum number of operations. We explored optimization strategies that would find the best order for a given ensemble of jet features. We did not see much improvement when employing complex scheduling strategies. The most effective and simple strategy was calculating features with the lowest sum of δ x ,δ y ,σ x and σ y first and working to higher ordered features, reusing lower ordered outputs where we could.

Advantages in Hardware
Jet features were developed to make calculations in hardware simpler for our new algorithm than the original ECO Features algorithm. The original ECO Features algorithm has several attributes that make it difficult to implement in hardware. Similar to the advantages discussed on Section 4.2, the jet features are even more advantageous in a hardware implementation.
First, the original algorithm forms features from a generic pool of image transforms. This is relatively straightforward to implement in software when a computer vision library is available, only requiring extra room in memroy for the library calls. In hardware however, physical space in silicon must be dedicated to units to perform each of these transforms. The jet feature transform utilizes a set of simple operations that are reused in every single jet feature.
Second, the transforms of the original algorithm are not commutative. The order they are executed effects the output. Intermediate calculations would need to have the ability to be routed from every transform to every other transform. This kind of complexity could be tackled with a central memory, a bus system, redundant transform modules and/or a scheduler. The jet transform is cumulative and the order of convolutions do not matter. Routing intermediate calculations becomes trivial.
Third, intermediate calculations from the original ECO Feature transformations can rarely be used in any other ECO Feature. On the other hand, jet features are cumulative. Using this property, the ECO Jet Features algorithm is easily pipelined and calculations for multiple features can be calculated simultaneously. In fact, instead of scheduling the order in which features are calculated, our architecture calculates every possible feature every time an input image is received. This allows for easy reprogrammability for different applications. The feature outputs required for that specific model are used and the others are ignored. Little extra hardware is required, and there is no need for a dynamic control unit.
Forth, calculating jet features in hardware requires only addition and subtraction operators in conjunction with pixel buffers. The transforms of the original ECO Features algorithm requires multiplication, division, procedural algorithm control, logarithm operators, square root operators and more to implement all of the transforms available to the algorithm. In hardware, these operations can require large spaces of silicon and can generate bottlenecks in the pipeline. As mentioned in Section 4.2, the Gaussian blur does require a division by two when normalizing. However, with a fixed base-two number system, this does not require any extra hardware. It is merely a left shift of the imaginary decimal place.

Hardware Architecture
The ECO Jet Features hardware architecture consists of two major parts, a jet feature unit and a classifier unit. A simple routing module connects the two, as shown in Figure 9. Input image data is fed into the jet features unit as a series of pixels, one pixel at a time. This type of serial output is common for image sensors, but we acknowledge that if the ECO Jet Features algorithm was embedded close to the image sensor, other more efficient types of data transfer would be possible. As the data is piped through the jet features unit, every possible jet feature transform is calculated. Only the features that are relevant to the specific loaded model are then routed to the classifier unit. The classifier unit contains a random forest for every ECO Jet Feature in the model and the appropriate output from the jet features unit is processed by the corresponding random forest.

The Jet Features Unit
The jet features unit calculates every feature for a given multiscale local jet. An input image is fed into the unit one pixel at a time, in row major order. As pixels are piped through the unit, it produces multiple streams of pixels, one stream for every feature in the jet.
All convolutions in jet feature transforms require the addition or subtraction of two pixels. This is accomplished by feeding the pixels into a buffer, where the incoming pixel is added or subtracted from the pixel at the end of the buffer, as shown in Figure 10 . Convolutions in the x direction (along the rows) require only a single pixel to be buffered due to the fact that the image sensor transmits pixels in row major order. Convolutions in the y direction, however, require pixel buffers to be the width of input image. A pixel must wait until a whole row of pixels is read in for its neighboring pixel to be fed into the system. With units for convolution in both the x and y directions, an array of convolutional units are connected to produce every jet feature for a given multiscale local jet. Each ECO Jet Feature can consist of multiple scaling factors and partial derivatives in both the x and y directions. With these four basic types of building blocks, we construct the ECO Jet Features unit to produce every possible features with a four-dimensional array of convolution units, one dimension for each type of building block. A four-dimensional array can grow quite large as the maximum number of allowed multiples of each convolution type gets larger.
By restricting the multiscale local jet, there are fewer possible jet features. In order to see the effect of restricting the maximum allowed values for δ and σ, we tested various configurations of the BYU Fish Species dataset. This dataset is further explained in Section 6. We restricted both δ and σ to a maximum of 15, 10 and 5. Each of these configurations was trained and tested. We observed that the genetic algorithm often selected either 0 or 1 as values for δ and so a configuraction where δ ≤ 1 and σ ≤ 5 was tested as well. Figure 11 shows the average test accuracy for each of these configuration as the model is being trained. From these results we feel confident that restricting δ and σ does not hurt the algorithms accuracy significantly. It does, however, restrict the space significantly, which can mean a much more compact hardware design. In our hardware architecture we restrict δ ≤ 1 and σ ≤ 5 and only have 144 different possible jet features. With restrictions on the maximum values for δ and σ, the size of the ECO Jet Features unit can be kept fairly small. The arrangement within the array can also help to reduce the size of the unit. Since the order of the convolutions does not matter, placing convolutional units for the y direction first, and reusing their outputs for all the other combinations requires less resources since the y direction convolutions require whole line buffers. Figure 12 shows how this array is arranged.  Figure 10. The second dimension consists of a single partial derivative factor for each of the outputs of the previous dimension and the input (δ ≤ 1). The angled blocks represent further convolutions, suggesting the third and forth dimensions.
Computing every possible ECO Feature allows for a fully pipelined design that does not need to be reconfigured for different ECO Jet Feature models. If a new model is trained, the design does not need to be re-synthesized to use the ECO Jet Features of the new model. The newly selected features are simply routed to the classifiers instead of the old ones.

The Random Forest Unit
Each of the selected ECO Jet Features is connected to its own random forest. Figure 13 shows how pixel data from each ECO Jet Feature is sent to a random forest unit. Once enough pixel data is fed into a forest to make a valid prediction, a valid signal is asserted and the predicted value is sent to a prediction unit. With valid predictions from all random forest units, the prediction unit tabulates the votes and produces the a final prediction for the whole ECO Jet Features system. Votes from the random forests units are weighted inside the prediction unit according to the SAMME AdaBoost algorithm (see Section 3). A random forest is a collection of decision trees. Figure 13 depicts individual tress inside each forest. Each tree votes individually for a particular class based on the values of specific incoming pixels. Nodes of these trees act like gates that pass their input to one of two outputs. Each node is accosted with a specific pixel location and value. If the actual value of that pixel is less than the value associated with node, the "left" output is opened. If the actual value is greater, the "right" output is opened. Each leaf node holds a prediction for the classification of the input image. Once a path is opened from the root of the tree to a leaf node, the prediction associated with that leaf is produced by the tree and a valid signal is asserted.
The hardware implementation for the decision tree splits the tree into four main components: node data (pixel values to compare with), pixel-node comparison unit, node structure unit and leaf prediction data, as shown in Figure 14.
The pixel value and pixel location information needed to evaluate every node in the tree is stored in the node data unit. Since pixels are streamed into the tree in row major order (left to right, top to bottom), the node data is stored in the order in which their associated pixels will arrive. The node data unit uses a pointer to note which node and pixel will arrive next. Once the pixel data arrives, the pixel-node comparison unit compares the pixel value with the value of the node. The pixel-node comparison unit then lets the tree structure know weather the pixel was greater or less than the value in the node. The tree structure unit keeps track of which nodes have been evaluated and which branches of the tree have been activated. Once a path from the root to a leaf have been activated, the prediction unit is signaled and a prediction value is sent to the output. Figure 14. The hardware structure of a decision tree. The node data unit holds the pixel value and pixel location information for the pixel each nodes is compared to. It outputs the data for the next pixel that will be streamed in. The node-pixel comparison unit waits for a match from the incoming pixel location from the input and the next pixel location from the node data unit. The pixel and node values are compared. The result is sent to the node structure unit. Once a leaf is activated in the tree structure, the prediction unit is signaled and a prediction is issued from the tree.
In order to select an efficient configuration, we experimented with different sizes of random forests and different numbers of jet features. We varied the number of trees in a forest, the maximum depth of each tree and total number of creatures, which corresponds with the number of forests. Figure 15 shows the accuracy of these different configurations compared with the total count of nodes in their forests. A pattern of diminished returns is apparent as models grow to be more then 3000 nodes. The models that performed best were ones with at least 10 creatures and a balance between tree count and tree depth. We used configuration of 10 features with random forests of 5 trees, each 5 levels deep. This setup requires 3150 nodes.

Datasets
ECO Features was designed to solve visual inspection applications. These application typically involve fixed camera conditions where the objects being inspected are similar. This includes manufactured good that are being inspected for defects or agricultural produce that is being graded for size or quality. These applications usually are fairly specific and real world users do not have extremely large datasets.
We first explore the accuracy of ECO Features and ECO Jet Features on the MNIST and CIFAR-10 dataset. Both are is a common datasets used in deep learning with small images with only 10 different classes in each dataset. The MNIST dataset consists of 70,000 28 × 28 pixel images (10,000 reserved for testing) and the CIFAR-10 dataset consists of 60,000 32 × 32 pixel sized images (10,000 reserved for testing). The MNIST dataset features handwritten numerical examples and the CIFAR-10 images each consist of various objects. Examples are shown in Figure 16.
We also tested our algorithms on a dataset that is more typical for visual inspection tasks. MNIST and CIFAR-10 contain many more images than what is typically available to users solving specific visual inspection tasks. Visual inspection training sets also include less variation in object type and camera conditions than in the CIFAR-10 dataset. The MNIST and CIFAR-10 datasets consist of small images, which makes execution time atypically fast for visual inspection applications. For these reasons we also used the BYU Fish dataset in our experimentation.
The BYU Fish dataset consists of images of fish from eight different species. The images are 161 pixels wide by 46 pixels tall. We split the dataset to include 778 training images and 254 test images. Images were converted to grayscale before being passed to the algorithm. Each specimen is oriented in the same way and the camera pose remains constant. This type of dataset is typical for visual inspection systems where camera conditions are fixed and a relatively small number of examples are available. Examples are shown in Figure 17.

Accuracy on MNIST CIFAR-10
To get a feel for how Jet Features changes the capacity of ECO Features to learn, we trained the ECO Features algorithm and the ECO Jet Features algorithm on the MNIST and CIFAR-10 datasets. These datasets have many images and were specifically designed for deep learning algorithms which can take advantage of such a large training set. We note that the absolute accuracy of these models does not compare well with the state of the art deep learning, but we use these larger datasets to fully test the capacity of our ECO Jet Features in comparison to the original ECO Features algorithm.
The images in the MNIST dataset have uniform scale and orientation. The images in the CIFAR-10 dataset are not as well conditioned. For this reason we see a higher error rate in the CIFAR-10 dataset than in the MNIST dataset. The ECO Jet Features algorithm is designed to classify images that are fairly uniform, like those in visual inspection applications where camera conditions are fixed. We include the CIFAR-10 dataset results as a means to compare the original ECO Features algorithm and the ECO Jet Features algorithm.
Each model was trained with random forests of 15 tress up to 15 levels deep. When testing on CIFAR-10, each model was trained 200 creatures and the accuracy as features were added is shown in Figure 18. The models were only trained to 100 creatures on MNIST where the models seem to converge, as shown in Figure 19. The CIFAR-10 results show that the models converge to similar accuracy while ECO Jet Features show a slight improvement ( 0.3%) over the original algorithm on MNIST. From these results we conclude that Jet Features introduce no noticeable loss in accuracy.

Accuracy on BYU Fish Dataset
We also trained on the BYU Fish dataset with the same experamental set up that was used on the other datasets. The results are plotted in Figure 20. While the datasets do seem to converge to a similar accuracy, results from training using such a small dataset may not be quite as convincing as those obtained using larger datasets. These results were for completeness since this dataset was used in our procedure and meant for testing speed, efficiency and model size.

Software Speed Comparison
While the primary objective of the new algorithm is to be hardware friendly, it is interesting to explore the speedup gained in software. Each algorithm was implemented on a full-sized desktop PC running a Skylake i7 Intel processor, using the OpenCV library. OpenCV contains built-in functionality for most of the transforms from the original ECO Features algorithm. It also provides vectorization for Jet Feature operations.
We attempted to accelerate these algorithms using GPUs, but found this was only possible on images that were larger than 1024 × 768. Even using images that were this large did not provide much acceleration. The low computational cost of the algorithm does not justify the overhead of CPU to GPU data transfer.
A model of 30 features was created for both. The BYU Fish dataset was used because the image sizes are more typical to real world applications. The original algorithm averaged a run time of 10.95 ms and our new ECO Jet Features algorithm averaged and execution time of 2.95ms, which is a 3.7× speedup.

Hardware Implementation Results
Our hardware architecture was designed in SystemVerilog. It was synthesized and implemented for a Xilinx Vertex-7 FPGA using the Vivado design suite. From our analysis reported in Section 5, Figures 11 and 15, we implemented a model with 10 features, 5 trees in each forest with a depth of 5, a maximum σ of 5 and maximum δ of 1. We used a round number of 100 pixels for the input image width. A model built around the BYU Fish dataset would have required only 46 pixels in its line buffers, but this length is small due to the oblong nature of fish. We feel a width 100 pixels is more representative of general visual inspection tasks.
The total utilization of available resources on the Xilinx Vertex-7 is reported in Table 2. One interesting point is that this architecture requires no Block RAM (BRAM) or Digital Signal Processing (DSPs) uints. BRAMs are dedicated RAM blocks that are built into the FPGA fabric. DSPs are generally used for more complex arithmetic operations, like general convolution. Our architecture, however, is compact enough and simple enough to not require either of these resources and instead host all of its logic in the main FPGA fabric. Look Up Tables (LUTs), make up the majority of the fabric area and are used to store all data and perform logic operations. To give a quick reference of FPGA utilization for a CNN on a similar Vertex 7 FPGA, Prost-Boucle et al. [39] reported using 22% to 74.4% of the 52.6 Mb of total BRAM memory for various sizes of the model. Our model did not require the use of any of these BRAM memory units. When comparing the number of LUTs used as logic, Prost-Boucle et al. used 112% more than our model in their smallest model and 769% more on their larger more accurate model.
The pixel clock speed can safely reach 200 MHz. Since the design is fully pipelined around the pixel clock, images from the BYU Fish dataset could, in theory, be processed in 37 µs. This is a 78.3× speedup over the software implementation on full sized desktop PC. A custom silicon design could be even faster than this FPGA implementation. Table 3 shows the relative sizes for individual units of the design. Some FPGA logic slices are shared between units and the sum of the individual unit counts exceeds the totals listed in Table 2. With a setup of 10 creatures, 5 trees per forest with a depth of 5, the Jet Features Unit makes up about 70% of the total design. However, since this unit is generating every jet in the muliscale local jet, it does not grow as more features are added to the model. We showed in Figure 11 that using a large local jet does not necessarily improve performance. The Random Forest unit makes up less then 35% percent of the design in all aspects other than LUT units that are used as memory, which is a subset of total LUTs. But, only 10 features were used and more could be added to increase accuracy as shown in Figure 20. Extrapolating out from these numbers, if all 144 possible features were added to this design, only 30% of resources available to the Vertex-7 would be used and 87.9% of them will be dedicated to the Random Forests Unit.
These results show how compact this architecture is. The simple operations and feed forward paths used in this design could very feasibly be implemented in custom silicon as well.

Conclusions
We have presented Jet Features, learned convolutional kernels that are efficient in both software and hardware implementations. We applied them to the ECO Features algorithm. This change to the algorithm allows faster software execution and hardware implementation. In software, the algorithm experiences a 3.7× speedup with no noticeable loss in accuracy. We also presented a compact hardware architecture for our new algorithm that is fully pipelined and parallel. On a FPGA, this architecture can process images in 37 µs, a 78.3× speedup over the improved software implementation.
Jet Features are related to the idea of multiscale local jets. Large groups of these transforms can be calculated in parallel. They incorporate many other common image transforms such as the Gaussian blur, Sobel edge detector and Laplacian transform. The simple operators required to calculate jet features allows them to be easily implemented in hardware in a completely pipelined and parallel fashion.
With a compact classification architecture for visual inspection, automatic visual inspection logic can be embedded into image sensors and compact hardware systems. Visual inspection systems can be made smaller, cheaper and available to a wider range of visual inspection applications.