Automatic Discrimination between Scomber japonicus and Scomber australasicus by Geometric and Texture Features

: This paper proposes a method for automatic discrimination of two mackerel species: Scomber japonicus (chub mackerel) and Scomber australasicus (blue mackerel). Because S. japonicus has a much higher market price than S. australasicus , the two species must be properly sorted before shipment, but their similar appearance makes discrimination difﬁcult. These species can be effectively distinguished using the ratio of the base length between the dorsal ﬁn’s ﬁrst and ninth spines to the fork length. However, manual measurement of this ratio is time-consuming and reduces ﬁsh freshness. The proposed technique instead uses image processing to measure these lengths. We were able to successfully discriminate between the two species using the ratio as a geometric feature, in combination with several texture features. We then quantitatively veriﬁed the effectiveness of the proposed method and demonstrated that it is highly accurate in classifying mackerel.


Introduction
Mackerel is a major aquatic resource worldwide. In Japan, Scomber japonicus (chub mackerel) and Scomber australasicus (blue mackerel) shown in Figure 1 are the two most common species, primarily caught together along the coasts [1]. The former has a far higher market price than the latter, necessitating discrimination before shipment. A typical difference between these two species is the textural feature on the ventral portion of their sides. In S. japonicus, this area is silver-white, whereas S. australasicus has spot patterns. In addition, the front view of S. japonicus is elliptical, while that of S. australasicus is circular. Generally, the fishing industry separates the mackerel based on these visible characteristics. However, some individuals have characteristics of both species and are difficult to discriminate even for experts (e.g., Figure 2).

Figure 2.
A mackerel that is difficult to classify. The specimen looks like S. japonicus, but has light spot patterns which is characteristic of S. australasicus.
The two species also tend to differ in the characteristics of the first dorsal fin, with S. japonicus generally possessing 9 or 10 spines, while S. australasicus has 10 to 13 spines [2]. However, this method of discrimination is not always reliable because spine number exhibits some individual variation. Of the available options, the most accurate identification method is to first determine the base length between the first dorsal fin's first and ninth spines (or the base length between spines (BLBS)), and then calculating the ratio of BLBS to the fork length (see Figure 3). If the ratio is greater than 0.12, the fish is S. japonicus otherwise, it is S. australasicus. According to the report by the National Research Institute of Fisheries Science in Japan [3], this method has an accuracy of about 99%. In our previous study presented at a workshop [4], we verified the effectiveness of this method by manually detecting fin region. However, these measurements are performed manually on individual fish, which is a laborious process that can also decrease freshness. Therefore, it is not suitable for quickly separating a large mackerel catch, and the fish industry would benefit from an automatic identification method.

Related Work
Some studies have examined techniques for automatic discrimination between fish. The existing methods are roughly classified into two types.
The first is more relevant for marine biology, aiming to detect and classify fish underwater. For example, a deformable template [5] can be applied that matches fish based on the shape context [6] and distance transformation [7]. A system is also available for detecting, tracking, and classifying fish in behavioral studies [8]. Fish was first detected through a combination of a Gaussian mixture model and a moving average algorithm. Next, fish were classified using texture features derived from Gabor filters and gray level co-occurrence matrix (GLCM) [9], as well as shape features calculated with the curvature scale space transformation [10]. Contour detection has also been applied to fish [11], specifically using the Canny edge detector [12] and blob detection with the Laplacian of Gaussian filter, before final species identification with Zernike moments [13]. A combination of detection with Gaussian mixture models and tracking by the Kalman filter has been used as well [14], with subsequent species identification performed using a pyramid histogram of visual words (PHOV) [15]. Hasija et al. [16] classified fish species based on image-set matching with subspaces; accuracy was improved through discriminant analysis plus graphical representation.
The second type of research has greater industrial application because it is aimed at discriminating individual caught fish. Fouad et al. developed a machine-learning method for discriminating Nile tilapia from other Nile-River fishes [17]. Both scale invariant feature transformation (SIFT) [18] and speeded up robust features (SURF) [19] were separately applied to extract distinct features from a whole image of the fish. Next, artificial neural networks (ANN), K-nearest neighbor (K-NN), and support vector machines (SVM) [20] were employed to separate the Nile tilapia from other species. The results indicated that the combination of SURF and SVM achieved extremely high accuracy. Rodrigues et al. proposed five schemes for fish classification [21]. Distinguishing features were identified using principal component analysis (PCA), SIFT , and vector of locally aggregated descriptors [22]. Identified features were then clustered with artificial immune networks (aiNet) [23], adaptive radius immune algorithms (ARIA) [24], and k-means algorithms.The results indicated that PCA was the most effective for feature identification, while aiNet or ARIA were best for clustering. A classification method for three types of tuna-bigeye, yellowfin, and skipjack-was developed using decision trees with shape and texture features [25]. The circular rate of the head was selected as a shape feature. Next, GLCM was used to extract texture features, including contrast, correlation, energy, and other relevant factors. Based on a decision tree constructed with the extracted features, test images were classified into one of the three tuna. The study employed texture features because skipjacks have a distinctive characteristic of multiple linear patterns on the abdomen.
All of the existing techniques used texture features or the fish's rough shape for discrimination. However, since the appearances of S. japonicus and S. australasicus are very similar, it is difficult to discriminate them by these features only. In this study, we propose an accurate discrimination method for mackerel using both textural and geometric features from the first dorsal fin. The proposed approach is in the second category described above.

Materials and Methods
The flow of the proposed method is shown in Figure 4. First, segmentation is performed on a given input image and fish location is determined, followed by fork-length measurement. Next, geometric and texture features are extracted from the image. Specifically, the relevant geometric feature is the ratio of BLBS to fork length, while the texture feature is calculated based on the GLCM in the ventral half of the fish's side. The two features are then combined and used for classification with the SVM. Below, we describe the details of each step.

Input
Input images are RGB color photographs of a mackerel laid horizontally with its head on the viewer's left. Examples are shown in Figure 1.

Segmentation
Edge detection and mathematical morphology techniques [26] (e.g., dilation and erosion) are used for segmentation. First, the input image is converted to gray scale for edge detection with the Sobel filter [27]. Figure 5a shows the results of edge detection on the image in Figure 1b, revealing the fish's contours clearly. Next, dilation is applied to two vertically adjacent and then two horizontally adjacent pixels. The resultant hole is filled with the edge pixel value, and the pixel touching the image boundary is erased. Figure 5b shows a representative result of dilation, wherein the fish area was roughly extracted.
The dilation-derived fish area is larger than the actual area and some noise pixels exist. Therefore, erosion is applied for three times for each pixel and its four-neighbor pixels, thus completing segmentation. As seen in the final segmentation image (Figure 5c), erosion eliminated noise, resulting in an area that was approximately the same as the fish's actual size.
Finally, fish location is estimated from this finalized image. The smallest rectangle surrounding each connected component is identified and its area is calculated. Fish location is set as the connected component with the largest surrounding rectangle (Figure 5d).

Fork-Length Measurement
Fork length is measured from the tip of the snout (s c ) to the fork of the tail (t c ) ( Figure 6). Thus, these points are first identified in the image as follows.
The distance from the upper right corner of the input image to each contour point is calculated. The minimum-distance contour point is considered the top of the caudal fin (t t ). Similarly, the bottom of the caudal fin (t b ) is determined based on the distance from the lower right corner of the input image. The fork of the tail (t c ) is determined as the leftmost contour point moving from t t to t b in a clockwise direction. The tip of the snout (s c ) is determined as the maximum-distance point from t c . Finally, the fork length is calculated as the distance between s c and t c .

Measurement of Base Length between Spines
This measurement involves rough detection of the first dorsal fin, followed by the positions of the spines. The base length between the first and ninth spines (BLBS) is then calculated.

Rough Detection of First Dorsal Fin
First, the region of the first dorsal fin is roughly detected from the input image, by identifying the peak (D p ), starting (D s ), and end (D e ) points of the first dorsal fin (Figure 7). The peak point (D p ) is the contour point with the minimum y-coordinate. Next, p u is on the contour and has the same x-coordinate as the point bisecting the line segment connecting D p and s c . The starting point of the first dorsal fin (D s ) is the farthest point on the contour from the straight line connecting D p and p u . Similarly, p t is on the contour and has the same x-coordinate as the point bisecting the line segment connecting D p and t c . Lastly, the end of the first dorsal fin (D e ) is the farthest point on the contour between D p and p t from the straight line through both points.
The rough region of the first dorsal fin is then detected as the rectangular region defined by D p , D s , and D e (red rectangle in Figure 7). The bottom of this region is the larger y-coordinate when comparing D s and D e . This region is called fin-region hereafter.

Detection of Spine Positions
Next, spines are detected considering the detected fin-region. Let the height and the width of the fin-region be h and w r , respectively. Since D e is often not exactly the end point of the dorsal fin, we set the lateral length of the region where spines are detected longer than that of the fin-region while the leftmost point is fixed to D s . If h/w r is smaller than 1.05 (This corresponds to a case where a part of the fin is missing), the region's width w is calculated as w = 1.7 w r . Otherwise, it was calculated as w = 1.15 w r . An example of the region where spines were detected is shown in Figure 8. Then we apply Gaussian smoothing to binarize the image. Binarization is performed following a method based on Niblack's local thresholding [28]. This procedure determines a threshold for each pixel based on a small region surrounding said pixel. Here, the threshold T(x, y) of pixel (x, y) is defined as where m(x, y) and s(x, y) are the mean and standard deviation of the small region surrounding pixel (x, y), while k and offset are parameters defined as k = 0.7 and offset = 8.5. In the original Niblack's method, offset = 0. The small region is set as 7 × 7 pixels centered at (x, y). If the pixel value of the smoothed image is larger than the threshold T(x, y), it is considered part of the spine region.
Connected components with less than five pixels are deleted as noise, and holes generated via the binarization process are filled. The binarized image of Figure 8 is displayed in Figure 9. Next, the number of white pixels in each row of the binarized image is counted. If the number is greater than four-fifths of the input width, the row is excluded from histogram construction (for an example, see the blue rectangle in Figure 9). The remaining bottom region is then extracted so that it has a height that was 7% of the image height (red rectangle in Figure 9). To detect each spine, a luminance histogram ( Figure 10) is generated as the average of the pixel values per column in the red rectangle. The histogram is then smoothed using the moving average method (window size = 2). The histogram's peaks represent the spines. First, peak (h pk ) and prominence (h pr ) values per histogram peak are calculated (see Figure 11). The prominence value is defined as the height from the higher level of adjacent valley or endpoint to the peak. Next, peak (w pk ) and prominence (w pr ) widths are determined. The peak width is defined by the intersections of the horizontal line at the peak's half height and the signal or border, where the border represents the horizontal position of the valley. The prominence width is defined by the intersections of the horizontal line at the half prominence and the signal. Peaks meeting the following thresholds: h pk < 0.05, w pk < 0.5, and w pr < 0.5 are discarded in advance as noise. In addition, peaks that satisfy the following three conditions are also discarded as being too small.
where median(h pr ), median(w pr ), and median(w pk ) are the median values of all prominence values, prominence widths, and peak widths, respectively. The values of parameters α 1 , α 2 and α 3 are determined in advance. In the binarization process, there are cases in which two spines are connected and extracted as one peak. If the following condition is satisfied, the peak is regarded to include two spines.
where α 4 is a parameter. In addition, when the second peak satisfies w pr ≤ w pk , if the distance between the peak and its previous peak is larger than 1.48 times the distance between the peak and its following peak, the peak is regarded to include two spines.
On the contrary, peaks that do not correspond to spines are deleted. We determine whether this is the case through measuring the interval between spines, which tends to grow gradually larger. Thus, each peak after the second one is discarded if the ratio of peak-to-following-peak distance and peak-to-previous-peak distance is less than 0.64.
The peak corresponding to the ninth spine can be difficult to discern from the luminance histogram because that spine tends to be very short or buried in the body. Therefore, when we only detect eight spines, we estimate the ninth spine's position by adding the interval between the seventh and eighth spines to the position of the eighth spine.
Next, the position of the ninth spine is slightly adjusted. If the ratio of the eighth-to-ninth-peak interval to the seventh-to-eighth-peak interval is greater than 1.1, then the position is moved left by one-fourth the distance between the eighth and ninth peaks. When eight peaks are detected, the position of the eighth spine is moved left by a quarter of the distance between the seventh and eighth peaks, if the ratio between the seventh-to-eight interval and the sixth-to-seventh interval is greater than 1.5.
Finally, BLBS is measured based on the positions of the first and ninth spines, allowing calculation of the BLBS to fork length ratio as the discriminating geometric feature.

Texture Feature Extraction
First, we determine the region for extracting texture features. Because S. australasicus spot patterns are on the ventral portion of the side, we block off this region by first delineating its upper left point as where the fork length is internally divided into 3:5 and then shifted by H/41.25; H is the height of the fish. The width and height of the region are respectively FL/4.2 and H/5.5, where FL is the fork length. These values were determined by observing standard mackerels. An example of the finalized region is shown in Figure 12. This area includes the spot pattern necessary to discriminate between S. japonicus and S. australasicus. Texture features are extracted from the designated area based on the GLCM [9]. In this matrix, element (i, j) represents the probability p δ (i, j), given δ(d, θ) as a parameter. Here, p δ (i, j) is the probability that a given pixel intensity is i and another pixel's intensity is j, where the distance between these pixels is d, and the angle between the line connecting the two pixels and a horizontal line is θ (Figure 13). The value of θ can be 0 • , 45 • , 90 • , or 135 • . This study uses five texture features: contrast, correlation, energy, homogeneity, and entropy. Contrast is a difference in intensity between neighboring pixels (the larger the contrast, the greater the intensity difference). Correlation is the linear dependence of intensities between neighboring pixels. A higher value indicates that there are pixels with similar intensity values in a specific direction. Energy is the uniformity between neighboring pixels. A high-energy value shows that more pixels and patterns of specific intensity values are in the image. Homogeneity is the similarity between neighboring pixels; thus, the image has more pixels with similar intensity values if homogeneity values are high. Entropy is the randomness between neighboring pixels; greater entropy indicates that more pixels with different intensity values are present in the image. Table 1 shows the parameters for calculating each feature. Figure 13. Parameters for calculating the gray level co-occurrence matrix to determine texture features.

Results and Discussion
We performed experiments on 19 images of S. japonicus and 18 images of S. australasicus. Thirty-three of these images were from the fish dataset FishPix [29] of the Kanagawa Prefectural Museum of Natural History. Figure 1 depicts an example image used for the experiment.
The first experiment isolated features effective for discrimination from a pool of all GLCM features. Because the number of images in the dataset was insufficient, we used double cross-validation to determine SVM parameters and to allow model learning. The dataset was subdivided into S 1 and S 2 . S 1 included 19 images (10 S. japonicus images and 9 S. australasicus images), while S 2 includes 18 images (9 S. japonicus images and 9 S. australasicus images). First, we performed a leave-one-out experiment to determine SVM parameters using dataset S 1 . Next, we used those SVM parameters to perform another leave-one-out experiment that classified each image in S 2 . The same process was repeated with the datasets switched. We calculated the accuracy via integrating these two results ( Table 2). Linear and radial basis function (RBF) kernels were used separately for the SVM.
When combining contrast and homogeneity as the texture feature, 84% accuracy was obtained, the highest value we observed (higher than using all textures). Furthermore, RBF was consistently better than linear kernel. Therefore, we combined contrast and homogeneity as the texture feature, and selected the RBF kernel for SVM. Next, the accuracy of the proposed method was verified through a comparison with a closely related, previously published technique [25]. For fair comparison, texture features described in Section 2.5 were also used for the existing method. We performed a leave-one-out analysis and trained the classifier with 36 out of 37 images. The remaining image was then subjected to the classifier. The parameters used in the experiments were determined as shown in Table 3. The results are shown in Table 4. The proposed method achieved a far higher accuracy than the Khotimah et al. method, verifying the former's effectiveness. Table 4. Comparison of classification accuracy between the proposed method and an existing, previously published method of fish discrimination. The row ratio only is the result of classifying the image based only on the BLBS to fork length ratio. The row texture only is the result of just using texture features.

Method Accuracy [%]
Proposed method 97% Ratio only 89% Texture only 84% Khotimah et al. [25] 76% Figure 14 shows an example that was not correctly classified by the proposed method. This S. australasicuswas inaccurately identified as S. japonicus. The misclassification was likely due to poor BLBS measurements. In this specimen, the peak of the second spine could not be detected due to the spines' close proximity. In addition, the spot pattern on the body was very faint, resulting in a texture feature that was similar to S. japonicus. Thus, to improve accuracy, future studies should aim to find a texture feature that can distinguish between the two mackerel even when specimens have faint spot patterns.

Conclusions
In this study, we proposed a novel method for discriminating S. japonicus and S. australasicus, using a combination of geometric and texture features extracted from standardized images. We used the BLBS to fork length ratio as the distinguishing geometric feature, as well as GLCM-calculated constant and homogeneity values as the texture feature. Mackerels were then classified based on these features, using SVM with the RBF kernel.
Experimental results verified the effectiveness of the proposed method. Due to issues during image acquirement, we observed several cases where the texture feature was unclear or the length measurement failed. However, even in such cases, the proposed method was capable of separating fish robustly because both features were always taken into account.
This time we showed experimental results with a small dataset because as far as we know there are few public datasets of S. japonicus and S. australasicus labelled by experts. Conducting verification experiments using a larger datasets and showing the statistical properties of the proposed method is a future work.
In addition, we determined some parameter values manually because we did not have enough data for statistical determination. Therefore, the values were not optimal ones. An important future task would be to obtain optimal parameter values using techniques such as machine learning with a far greater variety of images.