Evaluation of an Airborne Remote Sensing Platform Consisting of Two Consumer-Grade Cameras for Crop Identification

Remote sensing systems based on consumer-grade cameras have been increasingly used in scientific research and remote sensing applications because of their low cost and ease of use. However, the performance of consumer-grade cameras for practical applications has not been well documented in related studies. The objective of this research was to apply three commonly-used classification methods (unsupervised, supervised, and object-based) to three-band imagery with RGB (red, green, and blue bands) and four-band imagery with RGB and near-infrared (NIR) bands to evaluate the performance of a dual-camera imaging system for crop identification. Airborne images were acquired from a cropping area in Texas and mosaicked and georeferenced. The mosaicked imagery was classified using the three classification methods to assess the usefulness of NIR imagery for crop identification and to evaluate performance differences between the object-based and pixel-based methods. Image classification and accuracy assessment showed that the additional NIR band imagery improved crop classification accuracy over the RGB imagery and that the object-based method achieved better results with additional non-spectral image features. The results from this study indicate that the airborne imaging system based on two consumer-grade cameras used in this study can be useful for crop identification and other agricultural applications.


Introduction
Remote sensing has played a key role in precision agriculture and other agricultural applications [1]. It provides a very efficient and convenient way to capture and analyze agricultural information. As early as 1972, the Multispectral Scanner System (MSS) sensors were used for accurate identification of agricultural crops [2]. Since then, numerous commercial satellite and custom-built airborne imaging systems have been developed for remote sensing applications with agriculture being In this area, the main crops were cotton, corn, sorghum, soybean and watermelon in the 2015 growing season. Cotton was the main crop with the largest cultivated area, and it had very diverse growing conditions due to different planting dates and management conditions. Most cornfields were near physiological maturity with very few green leaves, and most of sorghum fields were in the generative phase reflected by beginning senescence at the imaging time. Especially corn was drying in fields for harvest. Soybean and watermelon were at the vegetative growth stages. Due to cloudy and rainy weather in much of May and June, aerial imagery was not acquired during the optimum period of crop discrimination, based on the crop calendars for this study area. However, this type of weather conditions is probably a common dilemma for agricultural remote sensing.

Imaging System and Platform
The dual-camera imaging system used in this study consisted primarily of two Nikon D90 digital CMOS cameras with Nikon AF Nikkor 24mm f/2.8D lenses (Nikon, Inc., Melville, NY, USA). One camera was used to capture three-band RGB images. The other camera was modified to capture NIR images after the infrared-blocking filter installed in front of the CMOS of the camera was replaced by a 720 nm long-pass filter (Life Pixel Infrared, Mukilteo, WA, USA). The other components of the system included two GPS receivers, a video monitor and a wireless remote trigger as shown in Figure  2. The detailed description of this system can be found in a single-camera imaging system described by Yang et al. [18]. The difference between the two imaging systems was that the single-camera system contained only one Nikon D90 camera for taking RGB images, while the dual-camera system had a the RGB camera and a modified camera for NIR imaging necessary for this study. This dualcamera imaging system was attached via a camera mount box on to an Air Tractor AT-402B as shown in Figure 2. In this area, the main crops were cotton, corn, sorghum, soybean and watermelon in the 2015 growing season. Cotton was the main crop with the largest cultivated area, and it had very diverse growing conditions due to different planting dates and management conditions. Most cornfields were near physiological maturity with very few green leaves, and most of sorghum fields were in the generative phase reflected by beginning senescence at the imaging time. Especially corn was drying in fields for harvest. Soybean and watermelon were at the vegetative growth stages. Due to cloudy and rainy weather in much of May and June, aerial imagery was not acquired during the optimum period of crop discrimination, based on the crop calendars for this study area. However, this type of weather conditions is probably a common dilemma for agricultural remote sensing.

Imaging System and Platform
The dual-camera imaging system used in this study consisted primarily of two Nikon D90 digital CMOS cameras with Nikon AF Nikkor 24mm f/2.8D lenses (Nikon, Inc., Melville, NY, USA). One camera was used to capture three-band RGB images. The other camera was modified to capture NIR images after the infrared-blocking filter installed in front of the CMOS of the camera was replaced by a 720 nm long-pass filter (Life Pixel Infrared, Mukilteo, WA, USA). The other components of the system included two GPS receivers, a video monitor and a wireless remote trigger as shown in Figure 2. The detailed description of this system can be found in a single-camera imaging system described by Yang et al. [18]. The difference between the two imaging systems was that the single-camera system contained only one Nikon D90 camera for taking RGB images, while the dual-camera system had a the

Spectral Characteristics of the Cameras
The spectral sensitivity of the two cameras was measured in the laboratory through a monochromator (Optical Building Blocks, Inc., Edison, NJ, USA) and a calibrated photodiode. The two cameras were spectrally calibrated with the lenses by taking photographs of monochromatic light from the monochromator projected onto a white panel. A calibrated photodiode was positioned at the same distance of the camera to measure the light intensity. The relative spectral response of one channel could be calculated for a given wavelength λ and a given channel (RGB) as shown in Equation (1) [23].
where ( , ) is the spectral response of channel = , , at wavelength. ( ) is the light intensity measured with the photodiode at wavelength. C(λ, n) is the digital count corresponding to channel = , , at wavelength. is the mean digital count of the dark background of channel = , , at wavelength. Wavelength ranged from 400 to 1000 nm, and the measurement wavelength interval was 20 nm. The average digital count for each channel was determined for the center of the projected light beam using image-processing software (MATLAB R2015a, MathWorks, Inc., Natick, MA, USA). In addition, the images were recorded by the raw camera format.
From the normalized spectral sensitivity of the two cameras (Figure 3), the sensitivity varied from 400 to 700 nm for the non-modified camera and from 680 to 1000 nm for the modified camera. With the 720 nm long-pass filter, the spectral response rose from 0 at 680 nm to maximum at 720 nm. It can be seen that the spectral sensitivity curves have some overlaps among the channels of each camera. This is very common in consumer-grade cameras, and is also one of the reasons that this type of cameras had not been commonly used for most scientific applications in the past. For the modified camera, the red channel had a much stronger response than the other two channels (blue and green) and the monochrome imaging mode in the NIR range. Thus, the red channel was chosen as the NIR image for remote sensing applications.

Spectral Characteristics of the Cameras
The spectral sensitivity of the two cameras was measured in the laboratory through a monochromator (Optical Building Blocks, Inc., Edison, NJ, USA) and a calibrated photodiode. The two cameras were spectrally calibrated with the lenses by taking photographs of monochromatic light from the monochromator projected onto a white panel. A calibrated photodiode was positioned at the same distance of the camera to measure the light intensity. The relative spectral response of one channel could be calculated for a given wavelength λ and a given channel (RGB) as shown in Equation (1) [23].
where R pλ, nq is the spectral response of channel n " r, g, b at λ wavelength. I pλq is the light intensity measured with the photodiode at λ wavelength. C(λ, n) is the digital count corresponding to channel n " r, g, b at λ wavelength. C dark is the mean digital count of the dark background of channel n " r, g, b at λ wavelength. Wavelength ranged from 400 to 1000 nm, and the measurement wavelength interval was 20 nm. The average digital count for each channel was determined for the center of the projected light beam using image-processing software (MATLAB R2015a, MathWorks, Inc., Natick, MA, USA). In addition, the images were recorded by the raw camera format. From the normalized spectral sensitivity of the two cameras (Figure 3), the sensitivity varied from 400 to 700 nm for the non-modified camera and from 680 to 1000 nm for the modified camera. With the 720 nm long-pass filter, the spectral response rose from 0 at 680 nm to maximum at 720 nm. It can be seen that the spectral sensitivity curves have some overlaps among the channels of each camera. This is very common in consumer-grade cameras, and is also one of the reasons that this type of cameras had not been commonly used for most scientific applications in the past. For the modified camera, the red channel had a much stronger response than the other two channels (blue and green) Figure 3. Normalized spectral sensitivity of two Nikon D90 cameras and relative reflectance of 10 land use and land cover (LULC) classes. The dotted lines represent different channels of the RGB camera (Nikon-color-r, Nikon-color-g, and Nikon-color-b) and the modified NIR camera (Nikon-nir-r, Nikon-nir-g, Nikon-nir-b, and Nikon-nir-mono). The solid lines represent the relative reflectance of 10 LULC classes.

Airborne Image Acquisition
Airborne images were taken from the study area at altitudes of 1524 m (5000 ft.) above ground level (AGL) with a ground speed of 225 km/h (140 mph) on 15 July 2015 under sunny conditions. The spatial resolution was 0.35 m at this height. In order to achieve at least 50% overlaps along and between the flight lines, images were acquired at 5-s intervals. Both cameras simultaneously and independently captured 144 images each over the study. Moreover, each image was recorded in both 12-bit RAW format for processing and JPG format for viewing and checking.

Image Pre-Processing
Vignetting and geometric distortion problems are the inherent issues of most cameras which usually cause some inaccuracy in image analysis results, especially for modified cameras [21,24]. Therefore, the free Capture NX-D 1.2.1 software (Nikon, Inc., Tokyo, Japan) provided with the camera manufacturer was used to correct the vignetting and geometric distortion in the images. The corrected images were saved in 16-bit TIFF format to preserve image quality.
There were 144 images to be mosaicked for each camera. The Pix4D Mapper software (Pix4D, Inc., Lausanne, Switzerland) was selected, which is a software package for automatic image mosaicking with high accuracy [25]. To improve the positional accuracy of the mosaicked image, some white plastic square panels with a side of 1 m were placed across the study area. A Trimble GPS Pathfinder ProXRT receiver (Trimble Navigation Limited, Sunnyvale, CA, USA), which provided a 0.2-m average horizontal position accuracy with the real-time OmniSTAR satellite correction, was used to collect the coordinates from these panels. Fifteen ground control points (GCP) as shown in Figure 1 were used for geo-referencing. As shown in Figure 4, the spatial resolutions were 0.399 and 0.394 m for the mosaicked RGB and NIR images. The absolute horizontal position accuracy was 0.470 and 0.701 m for the respective mosaicked images. These positional errors were well within 1 to 3 times of the ground sampling distance (GSD) or spatial resolution [26]. . Normalized spectral sensitivity of two Nikon D90 cameras and relative reflectance of 10 land use and land cover (LULC) classes. The dotted lines represent different channels of the RGB camera (Nikon-color-r, Nikon-color-g, and Nikon-color-b) and the modified NIR camera (Nikon-nir-r, Nikon-nir-g, Nikon-nir-b, and Nikon-nir-mono). The solid lines represent the relative reflectance of 10 LULC classes.

Airborne Image Acquisition
Airborne images were taken from the study area at altitudes of 1524 m (5000 ft.) above ground level (AGL) with a ground speed of 225 km/h (140 mph) on 15 July 2015 under sunny conditions. The spatial resolution was 0.35 m at this height. In order to achieve at least 50% overlaps along and between the flight lines, images were acquired at 5-s intervals. Both cameras simultaneously and independently captured 144 images each over the study. Moreover, each image was recorded in both 12-bit RAW format for processing and JPG format for viewing and checking.

Image Pre-Processing
Vignetting and geometric distortion problems are the inherent issues of most cameras which usually cause some inaccuracy in image analysis results, especially for modified cameras [21,24]. Therefore, the free Capture NX-D 1.2.1 software (Nikon, Inc., Tokyo, Japan) provided with the camera manufacturer was used to correct the vignetting and geometric distortion in the images. The corrected images were saved in 16-bit TIFF format to preserve image quality.
There were 144 images to be mosaicked for each camera. The Pix4D Mapper software (Pix4D, Inc., Lausanne, Switzerland) was selected, which is a software package for automatic image mosaicking with high accuracy [25]. To improve the positional accuracy of the mosaicked image, some white plastic square panels with a side of 1 m were placed across the study area. A Trimble GPS Pathfinder ProXRT receiver (Trimble Navigation Limited, Sunnyvale, CA, USA), which provided a 0.2-m average horizontal position accuracy with the real-time OmniSTAR satellite correction, was used to collect the coordinates from these panels. Fifteen ground control points (GCP) as shown in Figure 1 were used for geo-referencing. As shown in Figure 4, the spatial resolutions were 0.399 and 0.394 m for the mosaicked RGB and NIR images. The absolute horizontal position accuracy was 0.470 and 0.701 m for the respective mosaicked images. These positional errors were well within 1 to 3 times of the ground sampling distance (GSD) or spatial resolution [26].   To generate a mosaicked four-band image, the mosaicked RGB and NIR images were registered to each other using the AutoSync module in ERDAS Imagine (Intergraph Corporation, Madison, AL, USA). The RGB image was chosen as the reference image as it had better image quality than the NIR image. Several tie control points were chosen manually before the automatic registration. Thousands of tie points were generated by AutoSync and a third-order polynomial geometric model as recommended with the number of tie points was used [27]. The root mean square (RMS) error for the registration was 0.49 pixels or 0.2 m. The combined image was resampled to 0.4-m spatial resolution. The color-infrared (CIR) composite of the four-band image is shown in Figure 4.

Crop Identification
Selection of different band combinations and classification methods generally influence classification results. To quantify and analyze these effects on crop identification results, three typical and general image classification methods (unsupervised, supervised and object-based) were selected. Meanwhile, to examine how numbers of LULC classes affect the classification results, six different class groupings were defined as shown in Table 1. It should be noted that the three-band or four-band image was first classified into 10 classes and then the classification results were regrouped into six, five, four, three and two classes. For the ten-class grouping, the impervious class mainly included solid roads and buildings. Bare soil and fallow were grouped in one class and the water class included river, ponds, and pools. Considering soybeans and watermelon accounted for only a small portion of the study area, they were treated as non-crop vegetation with grass and forest in the five-class grouping and as non-crop in the four-and three-class groupings.

Pixel-Based Classification
The unsupervised Iterative Self-Organizing Data (ISODATA) and the supervised maximum likelihood classification were chosen as pixel-based methods in this study. Given the diversity and complexity of the land cover in the study area, the number of clusters was set to ten times of the number of land cover classes. The number of maximum interactions was set to 20 and the convergence threshold to 0.95. Then all of the clusters were assigned to the 10 land cover classes. For the supervised maximum likelihood classification, each class was further divided into 3 to 10 subclasses due to the variability within each of the land cover classes. After supervised classification, these subclasses were merged. For each subclass, 5 to 15 training samples were selected and the total number of training samples was almost equal to the number of clusters in ISODATA. The same training samples were used for supervised classification for both the three-band and four-band images.

Object-Based Classification
OBIA has been recognized to have outstanding classification performance for high-resolution imagery. Segmentation and definition of classification rules are the main steps of object-based classification. In order to show a transparent process and obtain an objective result, the estimation of scale parameter (ESP) tool was used for chosing segmentation parameters [28] and the classifier known as classification and regression tree (CART) was used for generating classification rules.
Segmentation for object-based classification was performed using the commercial software eCognition Developer (Trimble Inc., Munich, Germany). The segmentation processing that integrates spectral, shape and compactness factors is very important for the subsequent classification [29], but standardized or widely accepted methods are lacking to determine the optimal scale for different types of imagery or applications [30]. To minimize the influence of contrived factors in this step, some reference segmentation scales can be estimated by the estimation of scale parameter (ESP) tool [28], which evaluates variation in heterogeneity of image objects that are iteratively generated at multiple scale levels to obtain the most appropriate scales. For this study, a scale step of 50 was set to find some optimal segmentation scales from 0 to 10,000 with the ESP tool, and three segmentation parameters (SP) (1900, 4550 and 9200) had been estimated. To simplify the processing, the SP 4550 was used for image segmentation, which is suitable for most of land cover classes without over-segmentation or under-segmentation. To further improve the segmentation results, spectral difference segmentation with a scale of 1000 was performed to merge neighboring objects with similar spectral values. The three-band and four-band image segmentation produced 970 and 950 image objects, respectively, as shown in Figure 5. OBIA has been recognized to have outstanding classification performance for high-resolution imagery. Segmentation and definition of classification rules are the main steps of object-based classification. In order to show a transparent process and obtain an objective result, the estimation of scale parameter (ESP) tool was used for chosing segmentation parameters [28] and the classifier known as classification and regression tree (CART) was used for generating classification rules.
Segmentation for object-based classification was performed using the commercial software eCognition Developer (Trimble Inc., Munich, Germany). The segmentation processing that integrates spectral, shape and compactness factors is very important for the subsequent classification [29], but standardized or widely accepted methods are lacking to determine the optimal scale for different types of imagery or applications [30]. To minimize the influence of contrived factors in this step, some reference segmentation scales can be estimated by the estimation of scale parameter (ESP) tool [28], which evaluates variation in heterogeneity of image objects that are iteratively generated at multiple scale levels to obtain the most appropriate scales. For this study, a scale step of 50 was set to find some optimal segmentation scales from 0 to 10,000 with the ESP tool, and three segmentation parameters (SP) (1900, 4550 and 9200) had been estimated. To simplify the processing, the SP 4550 was used for image segmentation, which is suitable for most of land cover classes without oversegmentation or under-segmentation. To further improve the segmentation results, spectral difference segmentation with a scale of 1000 was performed to merge neighboring objects with similar spectral values. The three-band and four-band image segmentation produced 970 and 950 image objects, respectively, as shown in Figure 5. The classification pattern of object-based classification like eCongnition is mainly based on a series of rules from several features. User knowledge and past experience could be transferred to some constraint rules for classification [31]. However, it is very unreliable and highly individualized. Therefore, the CART algorithm was used for the training of object-based classification rules [32]. Because it is a non-parametric rule-based classifier and has a "white box" workflow [30], the structure and terminal nodes of a decision tree is easy to interpret, allowing the user to know the mechanism of the object-based classification method and evaluate it.
The CART classifier included in eCongnition could create the decision-tree model based on some features from training samples. In order to minimize the impact by the selection of different sample sets, the sample sets used in the supervised classification was also used. The difference was that the samples were turned into image objects. Then these image objects containing the class information were used as the training samples for the object-based classification. There were three main feature types used The classification pattern of object-based classification like eCongnition is mainly based on a series of rules from several features. User knowledge and past experience could be transferred to some constraint rules for classification [31]. However, it is very unreliable and highly individualized. Therefore, the CART algorithm was used for the training of object-based classification rules [32]. Because it is a non-parametric rule-based classifier and has a "white box" workflow [30], the structure and terminal nodes of a decision tree is easy to interpret, allowing the user to know the mechanism of the object-based classification method and evaluate it.
The CART classifier included in eCongnition could create the decision-tree model based on some features from training samples. In order to minimize the impact by the selection of different sample sets, the sample sets used in the supervised classification was also used. The difference was that the samples were turned into image objects. Then these image objects containing the class information were used as the training samples for the object-based classification. There were three main feature

Accuracy Assessment
For accuracy assessment, 200 random points were generated and assigned to each class in a stratified random pattern based on each classification map. At least 10 points were generated for each class. For this study, three classification methods were applied to two types of images, so there were six classification maps. A total of 1200 points were used for accuracy assessment of the six classification maps [30]. The number of points and percentages by class type are given in Table 3. Ground verification of all the points for the LULC classes was performed shortly after image acquisition for this study area. If one or more points fell within a field, the field was checked. Overall accuracy [46], confusion matrix [47], and kappa coefficient [48] were calculated. In order to evaluate the performance of the image types and methods, average kappa coefficients were calculated by class and by method.  Figure 6 shows the ten-class classification maps based on the three methods applied to the three-band and four-band images, including unsupervised classification for the three-band image (3US), unsupervised classification for the four-band image (4US), supervised classification for the three-band image (3S), supervised classification for the four-band image (4S), object-based classification for the three-band image (3OB), and object-based classification for the four-band image (4OB). To compare the actual differences between the pixel-based and object-based methods directly [29], no such post-processing operations as clump, sieve, and eliminate were perfomed for the pixel-based classification maps and no generalization was applied to the object-based classification maps either.  Most of the classification maps appear to distinguish different land cover types reasonably well. From a visual perspective, the "salt-and-pepper" effect on the pixel-based maps is the obvious difference with the object-based maps. The object-based maps present a very good visual effect as different cover types are shown by the homogenous image objects. Without considering the accuracy of the maps, the object-based classification maps look cleaner and more appropriate to produce thematic maps. Visually, it is difficult to evaluate the differences between the unsupervised and supervised methods or between the three-band and four-band images.

Classification Results
Specifically, the classification results of water and impervious had high consistence. Because of the lack of NIR band, some water areas in the three-band image was classified as bare soil and fallow for all the methods. Sorghum and corn were difficult to distinguish because both crops were at their late growth stages with reduced green leaf area. Corn was close to physiological maturity and above ground biomass was fully senescent, whereas sorghum was in the generative phase and started senescence, but still had significant green leaf material. Although late growth stages casued a reduction in canopy NDVI values for both corn and sorghum, the background weeds and soil exposure also affected the overall NDVI values. All crops and cover types show varying degrees of confusion among themselves. This problem also occurred in the object-based maps, but it does not appear to be as obvious as in the pixel-based maps. Table 4 summarizes the accuracy assessment results for the ten-class and two-class classification maps for the three methods applied to the two images. The accuracy assessment results for the other groupings are discussed and compared with the ten-class and two-class results in Section 4.3. Overall accuracy for the ten-class classification maps ranged from 58% for 3US to 78% for 4OB and overall kappa from 0.51 for 3US to 0.74 for 4OB. As expected, the overall accuracy and kappa were higher for the four-band image than for the three-band image for all the three methods. Among the three methods, the object-based method performed better than the two pixel-based methods, and the supervised method was slightly better than the unsupervised method.

Accuracy Assessment
For the individual classes, the non-plant classes such as water, impervious, and bare soil and fallow had better and stable accuracy results for all the six scenarios with an average kappa of 0.85, 0.82 and 0.74, respectively. Due to variable growing stages and management conditions, the main crop class cotton had a relatively low accuracy with an average kappa of 0.47 for all the scenarios. Although at later growing stages, sorghum and corn had a relatively good accuracy with an average kappa of 0.67 and 0.62, respectively. The main reason was that both crops were at senescence and had less green leaf material, so they could easily be distinguished with other vegetation. Soybean and watermelon had unstable accuracy results among the six scenarios, but their differentiation was significantly improved with the object-based method. The grass and forest in the study area were difficult to distinguish using the pixel-based methods, but they were more accurately separated with the object-based method.
For the two broad classes (crop and non-crop), overall accuracy ranged from 76% for 3US to 91% for 4OB and overall kappa from 0.51 for 3US to 0.82 for 4OB. Clearly, overall accuracy and kappa were generally higher for the two-class maps than for the ten-class maps. The two-class classification maps will be useful for some appliccations when total cropping area information is needed.  Overall accuracy= 77%

Importance of NIR Band
To analyze the importance of the NIR band, some kappa coefficients from Table 4 were rearranged and the average coefficients by image (AKp1) and by method (AKp2) were calculated (Table 5). It can be seen from Table 5 that the NIR band improved the kappa coefficients for four of the five crops and for three of the five non-crop classes. The net increases in AKp1 for the four crops were 0.28 for soybean, 0.12 for watermelon, 0.07 for cotton and 0.03 for sorghum, while the decrease in AKp1 for corn was 0.05. Although the classification for soybean was greatly improved, soybean only acounted for a very small portion of the study area which was less than 2.5%. Due to its small area and misclassification, there were unstable classification results for soybean as shown by the unique zero kappa value in Table 4. The contribution of the improvement for watermelon was mainly due to the object-based classification method. The classification for corn got worse mainly due to its later growth stage. Corn had low chlorophyll contents as shown by its flat RGB and reduced water contens as indicated by the relatively low NIR reflectance compared to the other vegetation classes. These observations could be confirmed by the spectral curves shown in Figure 3, which were derived by calculating the average spectral values from each class using the training samples for the supervised classification. The spectral curve of corn had the lowest reflectance at the NIR band among the vegetation classes. In other words, the NIR band was not sensitive to corn at this stage, which had a similar NIR response to the bare soil and fallow fields. In fact, the bare soil and fallow class was one of the main classes for misclassification with corn as shown in Table 4.
For the non-crop classes, the NIR band improved the classification for the water, impervious and bare soil classes. This result conforms with the general knowledge that NIR is effective at distinguishing water and impervious. The classes of grass and forest also benefited from the NIR band with the supervised method.
To compare the differences between the three-band and four-band images by classification method, average kappa coefficients (AKp2) increased for each of the three methods for the combined crop class and for two of the three methods for the combined no-crop class. If AKp3 is the average of the AKp2 values for the three methods, AKp3 increased from 0.52 for the three-band image to 0.61 for the four-band image for the crop class, and from 0.67 for the three-band image to 0.71 for the four-band image for the non-crop class. The crop class benefited from the NIR band more than the non-crop class.
If AKp4 is the average of the AKp3 values for the two general classes, AKp4 increased from 0.6 for the three-band image to 0.66 for the four-band image. Therefore, the addition of NIR improved the classification results over the normal RGB image.
To illustrate the classification results and explain the misclassification between some classes, spectral separability between any two classes in terms of Euclidean distance was calculated by ERDAS. To facilitate discussion, the Euclidean distance was normalized by the following formula: where x is the absolute Euclidean distance any two classes based on the training samples, and x 0 is the average of all the two-class Euclidean distances for either the three-band or four-band image, and x 1 is the normalized spectral distance ranging from´1 for the worse separability to 1 for the best separability.

Importance of Object-Based Method
As can be seen from Tables 4 and 5, the selection of the classification methods had a great effect on classification results. To clearly see this effect, the kappa analysis results were rearranged by the classification methods as shown in Table 6.
The average kappa coefficients between the three-band and four-band images (AKp5) were calculated for all the crop and non-crop classes for each of the three classification methods. For all the From the normalized Euclidean distance results shown in Figure 7, the forest and impervious classes had the best separation, while soybean and cotton had the worse separation for both the three-band and four-band images. These results should clearly explain why some of the classes had higher classification accuracy and kappa values than others. In general, the non-crop classes such as forest, water and impervious had high separability with crop classes, while the crop classes had relatively low separability among themselves. Since corn and sorghum are near the bottom of the list, it explains in another way why they were difficult to separate. There are more class pairs above the average spectral separability for the four-band than for the three-band, indicating that the NIR band is a useful for crop identification, especially for plants at their vegetative growth periods.

Importance of Object-Based Method
As can be seen from Tables 4 and 5 the selection of the classification methods had a great effect on classification results. To clearly see this effect, the kappa analysis results were rearranged by the classification methods as shown in Table 6.
The average kappa coefficients between the three-band and four-band images (AKp5) were calculated for all the crop and non-crop classes for each of the three classification methods. For all the crop classes, the object-based method performed best with the highest AKp5 values, followed by the supervised and unsupervised methods. Moreover, the object-based method performed better than the other two methods for all the no-crop classes except for water, for which the unsupervised method was the best. Similarly, if AKp6 is the average of the AKp5 values for the five crop classes, the AKp6 values for the crop class were 0.43, 0.51 and 0.76 for the unsupervised, supervised and object-based methods, respectively. The AKp6 values for the non-crop class were 0.65, 0.65 and 0.78 for the three respective classification methods.
Clearly, the object-based method was superior to the pixel-based methods. This was because the object-based method used many shape and texture features as shown in Table 2 to create homogeneous image objects as the processing units during the classification, while the pixel-based classification methods only used spectral information in each pixel during the classification. Figure 8 shows the decision trees and the number of features involved in the image classification process using the object-based method, which was created automatically by eCongnition. shows the decision trees and the number of features involved in the image classification process using the object-based method, which was created automatically by eCongnition.  (1) to (42), which is described in Table 2. Figure 9 shows the average kappa coefficients and differences for the crop and non-crop classes for the three classification methods. The difference in AKp6 between the crop and non-crop classes reduced from 0.22 for the unsupervised to 0.14 for the supervised and to 0.02 for the object-based method. Evidently, non-crop had a better average kappa coefficient than crop for the pixel-based methods because most of the non-crop classes such as water, impervious, and bare soil and fallow classes had better spectral separability than the other classes. However, both crop and non-crop had Decision tree models for object-based classification. Abbreviations for the 10 classes: IM=impervious, BF=bare soil and fallow, GA=grass, FE=forest, WA=water, SB=soybean, WM=watermelon, CO=corn, SG=sorghum, and CT=cotton. 1 (n) is the number ID of each feature, ranging from (1) to (42), which is described in Table 2. Figure 9 shows the average kappa coefficients and differences for the crop and non-crop classes for the three classification methods. The difference in AKp6 between the crop and non-crop classes reduced from 0.22 for the unsupervised to 0.14 for the supervised and to 0.02 for the object-based method. Evidently, non-crop had a better average kappa coefficient than crop for the pixel-based methods because most of the non-crop classes such as water, impervious, and bare soil and fallow classes had better spectral separability than the other classes. However, both crop and non-crop had essentially the same average kappa coefficient for the object-based classification method. between the three-band and four-band images, and AKp6=Average of the AKp5 values for the crop or non-crop classes. To explain the reason for this, the statistical results for the decision tree models used in the object-based classification method are summarized in Table 7. It can be seen from Figure 8 and Table  7, the three-band image used more non-spectral features at a higher frequency than the four-band image, which could compensate for the lacking of the NIR band in the normal RGB image. Most of the branches of three-band or four-band image decision tree models for classification used the shape and texture features (95% for the three-band and 82% for the four-band). These features were used more than one time with an average of 1.62 times for the three-band and 1.15 times for the four-band image. All these showed the importance and advantage of the non-spectral features for image classification. The non-spectral features are particularly important when there is no sufficient spectral information.
As shown in Table 6, the pixel-based methods performed better than object-based method for distinguish water. This is because the spectral information was enough to distinguish water and To explain the reason for this, the statistical results for the decision tree models used in the object-based classification method are summarized in Table 7. It can be seen from Figure 8 and Table 7, the three-band image used more non-spectral features at a higher frequency than the four-band image, which could compensate for the lacking of the NIR band in the normal RGB image. Most of the branches of three-band or four-band image decision tree models for classification used the shape and texture features (95% for the three-band and 82% for the four-band). These features were used more than one time with an average of 1.62 times for the three-band and 1.15 times for the four-band image.
All these showed the importance and advantage of the non-spectral features for image classification. The non-spectral features are particularly important when there is no sufficient spectral information.
As shown in Table 6, the pixel-based methods performed better than object-based method for distinguish water. This is because the spectral information was enough to distinguish water and non-spectral features could cause a worse result with the object-based method. Thus, the four-band image with the pixel-based methods achieved better classification results for water.

Importance of Classification Groupings
Thus far, only the ten-class and two-class classification results shown in Table 4 have been discussed. Figure 10 shows the overall accuracy and overall kappa for the six class groupings defined in Table 1 based on the six classification types. The overall accuracy generally increased as the number of classes decreased. However, this was not necessarily the case for the overall kappa. The two-class, five-class and ten-class classifications had higher kappa values than the three-class, four-class and six-class classifications except that the two-class classification for the object-based method had slightly higher kappa values than the tenclass classifications. Overall classification accuracy simply considers the probability of image pixels being correctly identified in the classification map. Kappa coefficient, by contrast, considers not only the correct classification but also the effect of omission and commission errors. By using the spectral separability shown in Figures 3 and 7, it could be found that class groupings with poor spectral separability between subclasses generally had higher kappa values. For example, the five-class classifications achieved the second highest kappa value for both the supervised and object-based methods. This particular class grouping combined four vegetation classes (grass, forest, soybean and watermelon) with similar spectral characteristics into one class. Approximately two-thirds of the spectral separability values between any two of the four classes were below the average level and these classes were very easy to become confused during the classification process. This confusion was eliminated when these classes were grouped into one class. Therefore, depending on the The overall accuracy generally increased as the number of classes decreased. However, this was not necessarily the case for the overall kappa. The two-class, five-class and ten-class classifications had higher kappa values than the three-class, four-class and six-class classifications except that the two-class classification for the object-based method had slightly higher kappa values than the ten-class classifications. Overall classification accuracy simply considers the probability of image pixels being correctly identified in the classification map. Kappa coefficient, by contrast, considers not only the correct classification but also the effect of omission and commission errors. By using the spectral separability shown in Figures 3 and 7 it could be found that class groupings with poor spectral separability between subclasses generally had higher kappa values. For example, the five-class classifications achieved the second highest kappa value for both the supervised and object-based methods. This particular class grouping combined four vegetation classes (grass, forest, soybean and watermelon) with similar spectral characteristics into one class. Approximately two-thirds of the spectral separability values between any two of the four classes were below the average level and these classes were very easy to become confused during the classification process. This confusion was eliminated when these classes were grouped into one class. Therefore, depending on the requirements of particular applications, all available classes can be regrouping based on their spectral characteristics into appropriate classes to improve classification results. With such a regrouping, the agronomical use of the classification map is practically reduced for relevant crops, but it could still be used for a LULC census.

Implications for Selection of Imaging Platform and Classification Method
From the above analysis, the additional NIR band and the object-based method both could improve the performance of image classification for crop identification. The imaging system used in this study included a modified camera to capture NIR information. The camera along with the lens, GPS, and modification fees was about $1300. Moreover, the images from the two cameras need to be aligned for analysis. The object-based classification method performed better than the pixel-based methods. However, the object-based method involves complex and time-consuming processing such as segmentation and rule training, and requires experienced operators to use the software. Therefore, how to weigh such factors as the cost, ease of use and acceptable classification results is a real and practical issue for users, especially for those without much remote sensing knowledge and experience.
Based on the results from this study, some suggestions are provided for consideration. If users do not have much experience in image processing, a single RGB camera with pixel-based classification can be used. For users with some image processing experience, a dual-camera system with the NIR sensitivity and pixel-based classification methods may be a good combination. For users with sufficient image processing experience, either a single RGB camera or a dual-camera system in conjunction with object-based classification may be an appropriate choice. It is also possible to modify a single RGB camera to have two visible bands and one NIR band [16,49]. This will eliminate the image alignment involved with the dual-camera system.

Conclusions
This study addressed important and practical issues related to the use of consumer-grade RGB cameras and modified NIR cameras for crop identification, which is a common remote sensing application in agriculture. Through synthetically comparing the performance of the three commonly-used classification methods with the three-band and four-band images over a relative large cropping area, some interesting results have been found.
Firstly, the NIR image from the modified camera improved classification results from the normal RGB alone. This finding is consistent with the common knowledge and results from scientific-grade imaging systems. Moreover, the importance of the NIR band appears to be especially evident in the classification results from pixel-based methods. Since pixel-based methods usually are easy to use by users without much experience in remote sensing, imaging systems with more spectral information should be used for these users.
Secondly, many non-spectral features such as shape and texture can be obtained from the image to improve the accuracy of image classification. However, object-based methods are more complex and time-consuming and require a better understanding of the classification process, so only advanced users with much experience in image processing could use object-based methods to obtain good results even with RGB images. Moreover, appropriately grouping classes with similar spectral response can improve classification results if these classes do not need to be separated. All in all, the selection of imaging systems, image processing methods, and class groupings needs to consider the budget, application requirements and operating personnel's experience. The results from this study have demonstrated that the dual-camera imaging system is useful for crop identification and has the potential for other agricultural applications. More research is needed to evaluate this type of imaging systems for crop monitoring and pest detection.