Object-Based Greenhouse Classification from GeoEye-1 and WorldView-2 Stereo Imagery

Remote sensing technologies have been commonly used to perform greenhouse detection and mapping. In this research, stereo pairs acquired by very high-resolution optical satellites GeoEye-1 (GE1) and WorldView-2 (WV2) have been utilized to carry out the land cover classification of an agricultural area through an object-based image analysis approach, paying special attention to greenhouses extraction. The main novelty of this work lies in the joint use of single-source stereo-photogrammetrically derived heights and multispectral information from both panchromatic and pan-sharpened orthoimages. The main features tested in this research can be grouped into different categories, such as basic spectral information, elevation data (normalized digital surface model; nDSM), band indexes and ratios, texture and shape geometry. Furthermore, spectral information was based on both single orthoimages and multiangle orthoimages. The overall accuracy attained by applying nearest neighbor and support vector machine classifiers to the four multispectral bands of GE1 were very similar to those computed from WV2, for either four or eight multispectral bands. Height data, in the form of nDSM, were the most important feature for greenhouse classification. The best overall accuracy values were close to 90%, and they were not improved by using multiangle orthoimages.


Introduction
Since the first plastic-covered greenhouses were built in agriculture on a significant scale by the early 1950s, the area covered by them has been increasing at a fast rate [1].North Africa, the Middle East and China are growing at 15%-30% annually, while Europe (mainly Spain, Italy and France) presents a much lower increase.Currently, there are about 1,600,000 ha of greenhouses and walk-in tunnels scattered all over the world [1].
The practice of plastic-covered agriculture, also known as plasticulture, is part of the transformation of conventional farming into a more industrial and high-tech precision agriculture [2].It is widely recognized that this type of cultivation is linked to a very important anthropic impact [3].In fact, the construction of greenhouses, together with the necessary infrastructure for their commercial exploitation (e.g., road networks, rural buildings, reservoirs and irrigation ponds), contribute to affecting the environment [4].Therefore, a careful spatial development planning is required in these agricultural areas to minimize their environmental impact [4][5][6].
Automatic mapping of greenhouses from remote sensing methods presents a special challenge, due to their unique characteristics [2,5,[7][8][9].In fact, the spectral signature of plastic changes drastically depending on: (i) the plastic material (i.e., thickness, density, transparency, light dispersal, ultraviolet and infrared properties, anti-fog additives and color); (ii) seasonal use of greenhouses (e.g., during summer, polyethylene sheets may be painted white to protect plants against excessive radiation and to lower the temperature inside the greenhouse); (iii) changing the reflectance of the vegetation underneath, which contributes to the overall reflectance of the greenhouse; and (iv) the angle of vision (i.e., the relationship between the light incidence angle and the remote sensor point of view).All these factors may result in different spectral signatures for the same type of land use, which entails a serious risk of confusion with other non-greenhouse classes.
The first works focused on greenhouses detection from satellite data were supported by Landsat Thematic Mapper images.Some of them were carried out in the Netherlands [10,11], in a coastal province of eastern China, named Shandong [12], in southeastern Spain [13] and in Southern Italy [6].However, the main problem of using Landsat images is related to their large pixel size or ground sample distance (GSD).With the advent of the first very high-resolution (VHR) commercial satellites, such as IKONOS and QuickBird, in 1999 and 2001, respectively, the aforementioned problem was solved.In fact, these VHR satellites are capable of capturing panchromatic (PAN) imagery of the land surface with GSD even lower than 1 m.To date, only a few works based on the detection of plastic greenhouses from IKONOS or QuickBird images have been performed [5,7,[14][15][16][17] by using different pixel-based approaches.
More recently, a new breed of VHR satellites with improved geometric and radiometric characteristics has been successfully launched.Among them, GeoEye-1 (GE1) and WorldView-2 (WV2) are the most innovative.Although this couple of sensors has been already tested for automatic classification of urban environments and building extraction [18][19][20][21], their application to greenhouse detection is limited.In fact, only a recent work by Koc-San [22] uses WV2 satellite imagery for the pixel-based classification of glass and plastic greenhouses in Antalya (Turkey).
Regarding VHR satellite imagery, it is worth noting that the higher geometric detail of the PAN image and the useful color information provided by the lower resolution multispectral (MS) image can be fused to produce a final pan-sharpened MS image with the geometric resolution of the PAN band [23].With the availability of pan-sharpened VHR satellite imagery, the classification of small-scale man-made structures has become of great interest.In this way, object-based image analysis (OBIA) has recently proved to be an effective approach to deal with classification tasks, especially in urban environments [24][25][26][27].OBIA does not use individual pixels, but pixel groups representing meaningful segments (or objects), which have been segmented according to different criteria before carrying out the classification stage.Furthermore, unlike traditional pixel-based methods that only use spectral information, object-based approaches benefit from different features, such as shape, texture and context information associated with the objects [24].At this point, and regarding remote sensing image analysis, it should be clearly stated that much of the work referred to as OBIA has been originated around the software known as eCognition (Trimble, Sunnyvale, California, United States).Indeed, about half of the papers related to OBIA are based on this package [24].To the authors' knowledge, OBIA has been only used for greenhouse detection by Arcidiacono and Porto [28] and Tarantino and Figorito [9].Both works applied OBIA techniques to detect plastic greenhouses from digital true color (RGB) aerial data in different study areas of Italy (Ragusa, Scicli and Apulia Region).In these last two works, a very good review about plasticulture extraction from remote sensing imagery was carried out.
Several works have combined MS imagery and height information, often derived from LiDAR data, to improve building automatic detection [29][30][31].In fact, the inclusion of elevation data increases the ability to differentiate objects with significant height, as compared with other spectrally similar classes [19].However, LiDAR data are often expensive, and they are not always available, especially in developing countries.Nowadays, the adaptable stereo imaging capability of the newest civilian VHR satellites and their improved geometric resolution allow the generation of very accurate digital surface models (DSM) by means of standard photogrammetric procedures.In the cases of GE1 and WV2, a vertical accuracy ranging from 0.4 m to 1.2 m has been reported in the literature [32][33][34].In this way, some authors have recently used single-source height and MS or pan-sharpened information from satellite platforms headed up to detailed urban classification or change detection studies [19,21,35,36].
In the case of greenhouse detection, 2D information delivered by satellite images is often not sufficient.For example, it is sometimes difficult to distinguish white plastic greenhouses from white buildings, due to their similar spectral characteristics [10].Likewise, greenhouses only covered by a shadow net film can also be confused with bare soil [13].In order to address this problem, and bearing in mind the aforementioned capabilities of satellite stereo imagery to provide height data, the joint use of 2D and 3D information from VHR stereo pairs seems to be a promising approach.
Moreover, the use of multiangle images has significantly improved the classification over the base case of a single nadir multispectral image [19,37].These improvements could be due to the marked angular and directional variation of the reflectance for certain surfaces.In fact, a multiangle data set from MS WV2 images allowed the differentiation of classes not typically well identified from a single image, such as skyscrapers, bridges and flat and pitched roofs [19].Duca and Del Frate [37] also reported significant improvements in the accuracy of asphalt, buildings and bare soil class detection from using the multiangle CHRIS-PROBA images (18 m GSD) with respect to the nadir acquisition.
Therefore, the goal of this article is strictly focused on comparing, in almost the same conditions, land cover classification accuracy with emphasis on greenhouses between VHR satellite stereo pairs from GE1 and WV2 by using the most complete data sets that can be retrieved from them (i.e., evaluating the implications of adding height and multiangle data to the information contained in a single and near-nadir image).The main novelty of this work lies in the joint use of spectral information from both PAN and pan-sharpened multiangle orthoimages and height data extracted by stereo matching from satellite stereo pairs to greenhouse classification through an OBIA approach.The spectral information includes features grouped into different categories, such as basic spectral information, band indexes and ratios, texture and the shape geometry of objects.

Study Site
The study area is located in the municipality of Cuevas del Almanzora, province of Almerí a, southern Spain (Figure 1).It comprises a rectangular area of about 680 ha around the village of Palomares and is centered on the WGS84 geographic coordinates of 37.2465°N and 1.7912°W.The land use between the urban area of Palomares and the Almanzora River is predominantly agricultural.In August 2011, the study area housed 124 greenhouses, which showed great heterogeneity.Thirty-three of them were covered by plastic material and the rest with different types of mesh or shadow net film.While some of them were being prepared for setting a new crop (i.e., they remained uncropped at the time when the satellite image was taken), others were dedicated to nurseries, and the rest housed different crops underneath (e.g., watermelon, melon, tomato, pepper and cut flowers).

Data Sets
The experimental data sets consist of two stereo pairs from GE1 and WV2 captured over the same study area and very close dates.From these stereo pairs, both orthoimages (2D information) and DSMs (3D information) were generated.
Currently, GE1 is the commercial satellite with the highest geometric resolution, presenting 0.41 m GSD at nadir in PAN mode and 1.65 m GSD at nadir in multispectral (MS) imagery.It includes the four classic bands: blue (B, 450-510 nm), green (G, 510-580 nm), red (R, 655-690 nm) and near-infrared (NIR, 780-920 nm).On the other hand, WV2 (0.46 m and 1.84 m GSD in PAN and MS modes, respectively) is the first commercial VHR satellite providing 8 bands in MS mode.In this sense, it is able to capture the four classical bands already contained in the MS image of its predecessors, i.e., B (450-510 nm), G (510-580 nm), R (630-690 nm) and near-infrared-1 or NIR1 (760-895 nm), and four additional bands, such as coastal (C, 400-450 nm), yellow (Y, 585-625 nm), red edge (RE, 705-745 nm) and near-infrared-2 (NIR2, 860-1,040 nm).Images from either GE1 or WV2 have to be down-sampled to 0.5 m and 2 m, in PAN and MS modes, respectively, for commercial sales, as required by the US Government.
The first stereo pair from WV2 was acquired on 18 August 2011 (Table 1), presenting a convergence angle of 31.35°.It was collected in Ortho Ready Standard Level-2A (ORS2A) format, containing both PAN and MS imagery.WV2 ORS2A images present both radiometric and geometric corrections.They are georeferenced to a cartographic projection using a surface with constant height and include the corresponding RPC sensor camera model and metadata file.The delivered products were ordered with a dynamic range of 11 bits and without the application of the dynamic range adjustment (DRA) preprocessing (i.e., maintaining absolute radiometric accuracy and the full dynamic range for scientific applications).The second stereo pair was collected on 27 August 2011 (Table 1).It was a bundle (PAN + MS) GeoStereo product from GE1 presenting a convergence angle of 30.39°.GE1 GeoStereo images present both radiometric and geometric corrections similar to those applied to WV2 ORS2A imagery.The delivered products had a dynamic range of 11 bits and without DRA preprocessing.The Sun position, the satellite track (north-south) and the stereo geometry of this GeoStereo product were similar to those configuring the aforementioned WV2 stereo pair.However, the first image in the case of the GE1 stereo pair had an off-nadir of 8.5° (the most single nadir image) and the second 23.1°, while in the WV2 case, the second image was the most vertical or near-nadir.
Besides the two described stereo pairs, a medium resolution 10-m grid spacing digital elevation model (DEM) was used.It was compiled from a B&W photogrammetric flight at an approximate scale of 1:20,000 and undertaken by the Andalusia Government.This DEM presented an estimated vertical accuracy of 1.34 m measured as root mean square error (RMSE).Finally, publicly available information, 0.5 m GSD aerial orthophotographs (true color RGB) were also used in this work.They were taken and made in 2010 as part of the Spanish Programme of Aerial Orthophotography (PNOA).

Pre-Processing of Data Sets
From the WV2 ORS2A stereo pair, two PAN orthoimages (one from the 22.4° off-nadir image (WV22) and the second from the 10° off-nadir one (WV10; see Figure 2a)) were generated by using the Andalusia DEM as ancillary data.Seven ground control points (GCPs) were always used to compute the sensor model based on rational functions refined by a zero order transformation in the image space (RPC0).In the same way, two pan-sharpened orthoimages were also generated from the corresponding images, WV10 and WV22 (with 0.5 m GSD and containing the spectral information gathered from the 8-band MS image) by applying the PANSHARP algorithm included in Geomatica 2013 (PCI Geomatics, Richmond Hill, ON, Canada).Both PAN and pan-sharpened orthoimages were made up of 4690 × 5800 pixels.The planimetric accuracies (RMSE 2D ) computed on the orthorectified PAN and pan-sharpened images were 0.56 m and 0.86 m for WV10 and WV22 respectively (see the work recently published by Aguilar et al. [38] for more information about the orthorectification process).
A 1-m grid spacing DSM covering the whole study area was extracted from the WV2 PAN stereo pair by means of the photogrammetric software package, OrthoEngine, also included in Geomatica 2013 (Figure 2b).It is noteworthy that any editing process was applied to the DSMs compiled from VHR satellite imagery in order to keep their production as hands-off as possible.The corresponding DSM vertical accuracy, measured as standard deviation, was ranging from 0.53 m over flat areas to 2.74 m over urban areas [34].A normalized digital surface model (nDSM) with 1-m grid spacing was calculated by subtracting the Andalusia DEM from the WV2-derived DSM (Figure 2c).From the GE1 stereo pair and following the same procedure applied to WV2, two PAN and two 4-band pan-sharpened orthoimages (from the images with 23.1° (GE23) and 8.5° (GE9) off-nadir) were produced by using the Andalusia DEM, seven GCPs and the RPC0 sensor model.The RMSE 2D estimated over the PAN and pan-sharpened orthoimages were 0.48 m and 0.66 m for GE9 and GE23, respectively [38]. Figure 2d shows the corresponding PAN orthoimage from GE9 image.As in the case of WV2, an nDSM (Figure 2f) was computed by subtracting the Andalusia DEM from the GE1 PAN stereo pair DSM (Figure 2e).The standard deviation of this GE1-derived DSM ranged from 0.39 m over flat areas to 2.67 m over urban areas [34].
A flowchart is depicted in Figure 3 in order to clarify the pre-processing methodology carried out to generate both orthoimages and nDSMs data sets from VHR satellite stereo pairs.

Manual Segmentation and Reference Map
The first step in OBIA consists of image segmentation to produce homogeneous and discrete regions, objects or segments, which will be used later as the classification units instead of pixels [24,39].VHR satellite imagery segmentation is a crucial step for attaining high accuracies at the final OBIA classification [40,41].Anyway, generating segmentations from a multiangle data set (two different off-nadir viewing angle orthoimages from each stereo pair in our case) presents a unique set of challenges related to the geometric changes of elevated objects projections (e.g., buildings or greenhouses) depending on the different off-nadir angles (perspective view).For that reason, it is necessary to undertake an individual segmentation for each image belonging to the same stereo pair.However, this turns out to be excessively time consuming, and the OBIA classification from all multiangle images would be much more complex, as the objects created from one image do not fully match those coming from the other.Besides, the use of individual segmentations for orthoimages of GE1 and WV2 would hamper the direct comparison between both sensors.Hence, only a single manual segmentation of the study area was conducted, and it was used for all orthoimages.The 406 objects making up this segmentation were generated by manual digitizing performed over the PNOA's aerial orthoimage at an approximate scale of 1:500 (Figure 4).These digitized objects always belonged to the nine classes shown in Figure 4a.
Cropped greenhouses and uncropped greenhouses included the plastic-covered greenhouses according to whether or not they were housing horticultural crops when the satellite images were taken.Cropped nets and uncropped nets were comprised of mesh-covered greenhouses with or without horticultural crops underneath.Dense nets could be defined as greenhouses covered by a green thick mesh material.The building class consisted of white buildings ranging from small sheds to large industrial buildings.Finally, the last three considered classes were vegetation, orchards and bare soil.Since our main goal was focused on greenhouse classification, both with plastic film and shadow mesh, all greenhouses present in the study area were digitized and assigned to classes cropped greenhouses, uncropped greenhouses, cropped nets, uncropped nets and dense nets.Since in some cases, the spectral differences between cropped and uncropped greenhouses covered by either plastic or mesh material were small (mainly depending on the type and the stage of development of the crop underneath), it was decided to group all these classes in only greenhouses and nets (Table 2).It should be borne in mind that the ground-truth or reference map was only based on the orthoimages from PNOA, GE1 and WV2, as well as very valuable information retrieved from the Google Street View tool (i.e., there was no field campaign in August 2011).The classes, vegetation, orchards, bare soil and buildings, were selected for their spectral similarity to greenhouses and nets (Table 2).According to Van der Wel [10], the class, greenhouses (referring to plastic greenhouses), could be confused with white buildings.In the same way, Sanjuan [13] reported problems discriminating between nets (greenhouses covered with shadow meshes) and bare soil.Furthermore, cropped net greenhouses with leafy and overgrown crops underneath could be confused with vegetation, as can be seen in Figure 5, where an outdoor lettuce crop can be appreciated on the left-hand side of the road, while on the right, a cropped net greenhouse can be distinguished.The six final classes selected in this work are very similar to the ones chosen by Tarantino and Figorito [9], although they did not consider white buildings.Figures 6 and 7 show an industrial building of about 9 m high and a very similar 4-m high plastic greenhouse over the four pan-sharpened orthoimages.It is noted that the objects digitized over the PNOA's orthoimage (blue line) do not always match well the object boundaries, which can be made out on satellite orthoimagery, due to different off-nadir view angles.This effect is even more noticeable in the objects with higher height and smaller area (see the little building digitized on the northwest corner of Figure 7).

Classification Procedure
The OBIA software used in this research was eCognition 8.0.Once the manual segmentation with 406 objects has been transferred to eCognition by using a previously digitized vector file as a thematic layer, it comes time to tackle the classification phase.A very well-known classifier, like nearest neighbor (NN), which is included in eCognition, was applied in this work.NN is a non-parametric supervised classifier, which stands out because of its simplicity and flexibility.
A subset of 42 well-distributed objects (approximately 10% of the 406 previously digitized objects) was selected to carry out the training phase, whereas the remaining 364 objects, also evenly distributed on the working area, were used for the validation phase (Figure 4b).In a similar work, Tarantino and Figorito [9] used 30 training samples to train the eCognition's NN classifier.Moreover, Aguilar et al. [20] reported that, in statistical terms, the most efficient choice dealing with urban areas classification from VHR imagery would be about 10% of training samples.In our case, the distribution among classes of both subsets is shown in Table 2.The NN classification was initially undertaken on the nine classes depicted in Table 2, but the final results were computed based on the final six classes by simply aggregating the classes corresponding to greenhouses and nets.
The object features selected to perform the supervised classification are presented and described in Tables 3 and 4.More in-depth information about these features can be found in the Reference Book of Definiens eCognition Developer 8 [42].CRI, EVI and VOGRE in Table 3 are vegetation indices used by Oumar and Mutanga [43].In the case of GE1, three different tests were carried out.The first of them was based on the GE9 PAN and pan-sharpened orthoimages (Single GE9).The second one used the GE23 PAN and pan-sharpened orthoimages (Single GE23).Finally, the third test considered all the information included in the two first cases, i.e., it used features from GE9 and GE23 orthoimages (stereo GE1).
Moreover, the features from image objects were grouped into 10 strategies for classification: (1) Super basic (SB): This set only includes the mean values of B, G, R and NIR bands.It has 4 features for both single GE9 and single GE23 and, therefore, 8 features for stereo GE1.
(2) Super basic and elevation (SB + nDSM): It is composed of the SB's features and the mean and standard deviation values computed from the nDSM within every object.It counts on 6 features for both single GE9 and single GE23, and 10 features for stereo GE1 (notice that the mean and standard deviation values coming from the nDSM have the same value for both single images).deviation values computed from the nDSM.It has 10 features for both single GE9 and single GE23 and 18 features for stereo GE1 (note that the mean and standard deviation values from nDSM are the same for both single images).( 7) Basic plus shape and geometry (B + Sh): It is composed of the B's features plus the seven features corresponding to shape and geometry (Table 4).It has 15 features for both single GE9 and single GE23 and 23 features for stereo GE1 (note that the seven features related to shape and geometry present the same value for both single images).( 8) Basic and texture (B + T): This set includes the B strategy plus five gray-level co-occurrence matrix (GLCMs) texture measurements computed on a PAN orthoimage.Only five out of the 14 GLCM texture features originally proposed by Haralick et al. [44] are considered in this work, due to both the strong correlation frequently reported between many of them [45] and their large computational burden.The same subset of GLCM texture features (i.e., contrast, entropy, mean, standard deviation and correlation) was selected by Stumpf and Kerle [46] working on a similar OBIA workflow.Therefore, the B + T set has 13 features for both single GE9 and single GE23 and 26 features for stereo GE1.(9) Basic plus nDSM and band indices (B + BIs + nDSM): This set includes the features corresponding to B + BIs and B + nDSM strategies.It has 13 features for both single GE9 and single GE23 and 24 features for stereo GE1 (the mean and standard deviation values from nDSM are the same for both single images).(10) All features (All): This strategy is composed of the sum of the following groups of features: (i) mean, standard deviation and ratio to scene for R, G, B, NIR and nDSM; (ii) Brightness_4; (iii) NDBI, NDGI and NDVI; (iv) 7 shape and geometry features; and (v) 5 texture features.Thus, it has 31 features for both single GE9 and single GE23 and 51 features for stereo GE1.
In the case of stereo GE1, the features from nDSM and, shape and geometry are the same for both single images.Moreover, brightness is computed for each object as the mean value of the 8 bands involved (i.e., 4 RGBNIR bands from each image making up the stereo pair).
In the case of WV2, six different tests were conducted.On the one hand, three of them were carried out by only using the four classical bands included in WV2 MS imagery, i.e., R, G, B and NIR1 (it is noted that the NIR1 band in WV2 would correspond to the NIR band in GE1).In those cases, the information corresponding to the newest bands (i.e., coastal, yellow, red edge and NIR2) was not considered.These first three tests were undertaken in exactly the same way as single GE9, single GE23 and stereo GE1, and again, the same aforementioned 10 strategies for GE1 classification were applied.These tests were named as: (i) single WV10-4, based on WV10 orthoimages, but only using four bands; (ii) single WV22-4, using 4-band WV22 orthoimages; and (iii) stereo WV2-4, using 4-band orthoimages from both WV10 and WV22.On the other hand, the last three tests were conducted by using the complete 8 bands included in the MS WV2 imagery, and they were named single VW10-8, single WV-22-8 and stereo WV2-8.
Once again, 10 strategies for classification were applied in a similar way to GE1 or WV2 4-band cases.They were the following: (1) SB: 8 features for both single WV10-8 and single WV22-8 and 16 features for stereo WV2-8.
(2) SB + nDSM: 10 features for the single tests and 18 features for stereo WV2-8.The number of features and strategies used in the classification stage are summarized in Table 5 for a better understanding for the reader.
Regarding non-parametric supervised classification methods, comparative studies have shown that classification accuracy obtained from more sophisticated methods, such as support vector machines (SVM), may outperform NN [48][49][50].Besides, SVM usually turns out to be more stable in high-dimensional feature spaces needing a small training sample size [51,52].However the SVM model's classification accuracy is largely dependent on the selection of the model's parameters [53].In short, SVM methods try to find a hyperplane that splits a data set into two subsets during the training phase [54].Preliminarily, the input features are mapped into a higher-dimensional space through a suitable kernel function to obtain that hyperplane.Herein, we used a radial basis kernel function (RBF) and estimated the kernel parameter (γ) and the penalty parameter (C) through cross-validation from the training data [53].The free-distribution library, LIBSVM [53], was employed for applying the SVM classifier to the aforementioned 9 tests (i.e., single GE9, single GE23, stereo GE1, single WV10-4, single WV22-4, stereo WV2-4, single WV10-8, single WV22-8 and stereo WV2-8), but only using the strategy named All features, which presents the most high-dimensional feature space.Taking into account the complex experimental design accomplished in this work, a synthetic flowchart is presented in Figure 8 in order to clarify the classification procedure section.

Accuracy Assessment
Summing up, 9 tests and 10 different strategies (i.e., 90 classification projects) were conducted in this work by applying the NN algorithm.Furthermore, the SVM classifier was tested for the 9 tests, but only for the strategy named All features (i.e., 9 classification projects).All the classifications tasks carried out in this study were performed over the same training objects (42), and the corresponding accuracies were always assessed against the same testing objects (364).
The classification accuracy assessment undertaken in this work was based on the error matrix [55].Hence, the accuracy measures, always computed on the same validation set, were the following: user's accuracy (UA), producer's accuracy (PA) and overall accuracy (OA).
Finally, the F β measure [19,56], which provides a way of combining UA and PA into a single measure, was also computed according to the Equation ( 1), where the parameter, β, determines the weight given to the user's and producer's accuracies.The value used in this study (β = 1) weighs UA equal to PA.

Classification Accuracy with Regard to the Strategies
The obtained land cover classification results from using different tests and strategies, but always applying the NN algorithm included in eCognition, are depicted in Figure 9.The overall accuracies values attained from GE1 projects (Figure 9a), WV2 using only the four classical bands (Figure 9b) and WV2 using all eight bands (Figure 9c), showed similar behavior.Regarding the strategies tested along this work, and as a general rule, B + nDSM (Strategy 6), B + BIs + nDSM (Strategy 9) and All features (Strategy 10) outperformed the others.Both BIs (mainly NDVI, NDBI and NDGI) and especially nDSM added valuable information to the basic spectral data represented here by the strategy, SB (Strategy 1, where only spectral band mean values are considered) and B (Strategy 3, where mean and standard deviations are included).In fact, the average OA of the nine tests was improved by approximately 3.6 percentage points when nDSM information was added to the B strategy, while a much more modest increase of 1.8% was attained by using BIs features.The importance of height data in the classification results can also be quantified by observing the large difference in overall accuracy (close to 9%) between the attained results in SB and SB + nDSM strategies for all the considered tests.In this way, but working on an urban environment, an improvement of 13% in the F-measure (F β ) was reported by Longbotham et al. [19] when height data compiled from a VHR stereo pair was added to the eight-band mean values of the single nadir pan-sharpened image of WV2.They achieved even larger increases when the target classes presented higher heights (e.g., skyscrapers or buildings).
Digital height maps have already been extracted from VHR stereo pairs for helping to the classify tasks.For example, Mahmoudi et al. [21] and Tian et al. [36] used a robust stereo matching algorithm based on semi-global matching (SGM) developed by Hirschmüller [57] on stereo pairs of IKONOS and WV2.Anther dense matching method, implemented in the BAE Systems' Socket Set NGATE module, was used by Longbotham et al. [19] to extract DSM from WV2 multiangle images.A more conventional matching algorithm included in the OrthoEngine Module of PCI Geomatica was also used by Koc-San and Turker [35] over IKONOS stereo pairs.This last algorithm coincided with the matching algorithm applied in our work to attain the DSM.Although it usually performed quite well for both GE1 and WV2 stereo pairs, as can be seen in Figure 2b,e, it presented several problems working on greenhouses covered by shadow nets and also in urban areas (Figure 10).The last issue has been already reported by Aguilar et al. [34].Just to give the reader an idea of the global behavior of the matching algorithm used in this work, Figure 11 shows the mean height of each object computed from GE1 nDSM with regard to its area (the results for WV2 turned out to be very similar).It can be noted that the objects belonging to plastic greenhouses presented a very homogeneous height around 4 m, indicating that the matching algorithm worked very well on this type of surface.In the case of greenhouses covered by shadow nets, the computed mean heights per object were much more heterogeneous, ranging from zero to 4 m.Finally, and dealing with buildings, problems due to an inaccurate nDSM were only evident when working with small objects.In fact, buildings presenting a size bigger than 2,000 pixels were assigned nDSM mean values ranging from 5.6 m to 11.3 m in the case of the GE1 stereo pair (from 4.9 m to 11.5 m in the case of WV2 stereo pair).
Regarding texture features, the second-order textural parameters tested in the Strategy 8 (B + T) did not clearly improve the results achieved by the B strategy.However, Longbotham et al. [19] achieved improvements higher than 10% by adding to the original spectral information several texture features calculated from the WV2 PAN band, such as homogeneity, contrast, dissimilarity, entropy, second moment and correlation.In our work, textural information seems to be totally covered by the standard deviation values of the spectral bands as a first-order texture measure.
Strategy 5 based on ratios to scene (B + Rs) hardly had any influence on the classification results.In addition, shape and geometry features included in Strategy 7 not only failed to improve overall accuracy values achieved in the B strategy, but they significantly worsened it.

Classification Accuracy with Regard to the Tests
Considering that manual segmentation was performed on an aerial orthoimage, the digitized objects should geometrically match the nearest nadir satellite orthoimagery better than those with a larger off-nadir (see Figures 6 and 7).Thus, as for the tests that consider only single images, the classification accuracy should be better for projects based on orthoimages presenting the smallest off-nadir (i.e., single GE9, single WV10-4 and single WV10_8).However, disregarding the heterogeneous Strategy 7, this fact only occurred for the SB and SB + nDSM strategies.Focusing on the three best strategies (i.e., B + nDSM, B + BIs + nDSM and All features), OA values were always worse in the tests carried out with near-nadir orthoimages, especially in the case of GE1.For instance, using the B + nDSM strategy as a reference, OA values of 89.01% and 84.89% were attained for single GE23 and single GE9, respectively.In the WV2 tests, OA values of 87.91% and 86.26% were computed for single WV22-4 and single WV10-4, while 87.39% and 86.81% were achieved for single WV22-8 and single WV10-8, respectively.In an attempt to understand these unexpected results, the per-class accuracy is shown for the three main strategies in Figure 12.It can be noted that the OA improvements corresponding to the tests with the higher off-nadir orthoimages were mainly due to a better effective discrimination between greenhouses and buildings classes and, so, increasing the F β measure in both classes.In fact, misclassification between these two classes often occurred in small buildings, which were finally classified as greenhouses.In this way, Figure 13 shows the classification results of a detailed area inside our study area, where the existing seven small buildings are highlighted in red (Figure 13b).Four and three out of seven buildings were misclassified in single GE9 (Figure 13c) and single WV10-8 (Figure 13d), respectively.Bearing in mind that the displacements of the roof of the small buildings due to changes in off-nadir are rather variable (see the small building located in the upper right corner, Figure 7), the features computed for these small objects in each test also become very heterogeneous.In our work, and due to this edge effect, it seems that the small buildings in the tests presenting a high off-nadir offered unique spectral characteristics, and this fact improved the discrimination between greenhouses and buildings classes.However, this effect strongly depends on several factors, such as the segmentation process, the sun and satellite position and the elevation at the time of the image collection.Anyway, confusion between small buildings and greenhouses can be easily solved in operational conditions by introducing a threshold condition involving the area of these objects.A similar procedure was conducted by Agüera et al. [7] and Arcidiacono and Porto [58].
Regarding the classification results coming from adding multiangle orthoimages spectral information (i.e., stereo GE1, stereo WV2-4 and stereo WV2-8), no significant improvement with respect to the single tests was achieved.Furthermore, OA from stereo GE1 were worse than single GE23 for the best three strategies.In general, OA from stereo tests achieved the best results when the strategy All features was used.In the last case, the per-class accuracies measured as F β (Figure 12) were much more homogeneous for each of the six classes (F β values lower than 80%).So far, and to the best knowledge of the authors, multiangle satellite images had been used by Longbotham et al. [19] and Duca and Del Frate [37] for the generation of land cover maps.However, the last works were carried out by using pixel-based classification schemes, so they did not have to face the problem of segmentation, which is vital in an OBIA approach.Another novel contribution of this paper was the comparison between the performance of newly developed VHR satellite imagery from WV2 and GE1 to classify land cover in mainly intensive agricultural environments with emphasis on greenhouse detection.Looking at Figure 12, any clear difference can be observed in the classification accuracy achieved for both GE1 and WV2-4 tests.Moreover, the new spectral bands of WV2 could not provide a significant improvement, neither in OA nor in per-class accuracy.Regarding the classes related to greenhouses (i.e., greenhouses and nets), they reached very acceptable F β values ranging from 82% to 95% for any single test involving GE23 or WV22 orthoimages.According to Marchisio et al. [59], the use of the eight-band WV2 yielded improvements in classification accuracy figures ranging from 5% to 20% for certain land-cover types, such as man-made materials, selected vegetation targets, soils and shadows.In fact, Pu and Landry [47] compared VHR imagery from IKONOS and WV2 to identify and map urban forest tree species, finding that the 8-band WV2 significantly increased the accuracy of identifying tree species with respect to IKONOS data.In the same way, Koc-San [22] reported that using all 8 bands of WV2 satellite imagery was important to differentiate glass and plastic greenhouses.By contrast, Aguilar et al. [20] did not find improvements in classification accuracy between GE1 and WV2 imagery when working on urban environments, whereas Marshall et al. [60] concluded that there was no benefit in using the additional four bands of WV2 to discriminate invasive grass species.Finally, the best OA values achieved in this work by using OBIA techniques on a very heterogeneous greenhouse area were close to 90% for both GE1 and WV2 cases.As it was mentioned before in the introduction section, few papers can be found in the literature related to greenhouse detection by using different pixel-based approaches.Although Agüera et al. [7] reported a greenhouse detection percentage value of 91.45% working with QuickBird images, it was achieved after applying two algorithms to refine the raw classification (both to eliminate loops and to define the sides of greenhouses).The same authors [5] compared QuickBird and IKONOS imagery in plastic greenhouse detection and analyzed the effect of texture in the pixel-based classification accuracy.They concluded that the addition of texture information in the classification process did not improve the classification accuracy, and the best results in terms of greenhouse detection percentage took values of 89.61% and 88.26% for QuickBird and IKONOS, respectively.Working on WV2 pan-sharpened imagery, Koc-San [22] reported OA values from 87.80% to 93.88% by using different pixel-based classification techniques, such as maximum likelihood, random forest and SVM.Arcidiacono and Porto [15] reported a greenhouse detection percentage of 85.71% by using RGB VHR satellite bands only.This accuracy was improved to 88.42% by adding texture descriptors.In the same way, Arcidiacono and Porto [28] compared both pixel and object-based approaches to classify greenhouses on RGB aerial images.The best accuracy results ranging from 83.15% to 94.73% were attained by using OBIA techniques (in this case, the commercial software, VLS Feature Analyst ® for Leica Erdas Imagine ® ).Last, but not least, Tarantino and Figorito [9] achieved an OA of 90.25% by using OBIA classification implemented in eCognition for mapping plastic-covered vineyards from aerial photographs.
In summary, a lot of approaches (mainly grouped into pixel-based and OBIA techniques) have already been applied to greenhouse detection on VHR aerial or satellite imagery.However, the classification accuracy always was around 90%. Taking into account that the study areas in the aforementioned works were very changeable, it is difficult to properly compare these results.In this sense, it is important to consider the differences in the cost of each type of imagery and also the specialized software needed to perform each approach.

Classification Accuracy with Regard to the Classifier
Classifications algorithms frequently used in object-based approaches, as is the case of NN, do not perform well on a high-dimensional feature space, due to problems related to feature correlation (the widely known curse of dimensionality).In this section, the SVM classifier, which had previously shown a good performance in dealing with large number of features [49], was compared to the NN classifier for all features strategy (i.e., nine tests).Although OA values were very similar for NN and SVM classifiers, they were always slightly better when NN was applied.Table 6 shows the accuracy results in terms of PA and UA measures, but only for the stereo tests, which presented the most high-dimensional feature space.For each row, the bold values highlight the best accuracies for both PA and UA, respectively.On the whole, NN performed quite well with bare soil, greenhouses and nets, whereas SVM only outperformed NN in the case of the buildings class.The misclassification between vegetation and net classes (mainly with cropped shadow nets), which had been very significant when using Strategies 6 and 9 (Figure 12), seems to have decreased when the All features strategy was used with NN.However, the performance of SVM when classifying vegetation and orchards was quite poor.

Conclusions
This work compares the capabilities of stereo pairs from GE1 and WV2 VHR satellite imagery for land cover classification with emphasis on detecting greenhouses covered by plastic films or shadow nets.To this end, features based on spectral information coming from both PAN and pan-sharpened orthoimages and a digital height map extracted by stereo matching from satellite imagery were used together through an OBIA approach.As far as we know, this is the first work to use single-source height and multispectral information coming from both images of each stereo pair (i.e., multiangle orthoimages) for the purpose of greenhouse classification.
The study was implemented in an agricultural area around the village of Palomares, Almerí a (Spain).Likely, the study site counted on a very low number of greenhouses, especially those covered by plastic film.Furthermore, these greenhouses were very heterogeneous, both in terms of type of covering and the crop underneath.
The segmentation process was carried out by manually digitizing on aerial orthoimages.In this way, the digitized objects could be used for all the tests conducted in this work, involving orthoimages with different off-nadir from both WV2 and GE1.In further practical applications, and working with the two multiangle images of one stereo pair by using OBIA techniques, the segmentation step could be automatically performed on the most vertical or near-nadir orthoimage.Furthermore, thematic vector layers might be also used if they were available.
The overall accuracies attained by the NN classifier from GE1 tests (four MS bands) turned out to be very similar to the ones computed from WV2 (either four or eight MS bands) for every set of features tested.In general, nDSM was the most important feature for greenhouse classification.In fact, the three best strategies (i.e., B + nDSM, B + BIs + nDSM and all features) included height data extracted from the VHR stereo pair.The best overall accuracy values were close to 90%, and they were not improved by the use of multiangle orthoimages (stereo tests).The shape and geometric features (B + Sh), ratios to scene (B + Rs) and texture features based on GLCM (B + T) did not contribute to improving the classification attained by the basic spectral features strategy (B).
Disaggregating the results by class, the behavior of the three single tests performed with the near-nadir orthoimages (i.e., single GE9, single WV10-4 and single WV10-8) became very similar when applying the three best strategies.In these tests, there were a lot of misclassification problems between greenhouses and buildings classes, presenting F β indices usually lower than 75%.In fact, many small buildings were misclassified as plastic greenhouses.Curiously, this problem was partially solved in the single tests using the higher off-nadir orthoimages, due to the significant displacements of the roofs of the small buildings.In this way, these small objects, including pixels from other land cover in their edges, presented unique characteristics that helped for their correct classification.Anyway, and regarding an operational case, confusion between small buildings and greenhouses can be easily solved considering the area of these objects (i.e., introducing a threshold condition).More difficult to solve would be the misclassification between vegetation and nets (when the last are cropped), especially taking into account that the DSM extracted over mesh greenhouses was not very accurate.
Finally, NN and SVM non-parametric classifiers were compared through the All features strategy.The theoretical superiority of the SVM approach, as compared to NN, was not confirmed in this study.In fact, OA values were always slightly better when NN was applied.
Further works should be focused on testing new dense matching algorithms to improve the DSM extraction, especially on greenhouses with mesh covering and urban areas.Furthermore, the accuracy of the automatic segmentation process applied to the greenhouse area poses a great challenge.We are now planning to address both topics in new Mediterranean study sites mainly dedicated to greenhouse growing.

Figure 1 .
Figure 1.The location of the study area (red rectangle) using aerial orthophotographs from the Spanish Programme of Aerial Orthophotography (PNOA) in 2010.Coordinate system: WGS84 UTM zone 30N.

Figure 3 .
Figure 3. Flowchart regarding the pre-processing stage of data sets used in this work.

Figure 4 .
Figure 4. Location of the 406 objects digitized over the PNOA's orthoimage.(a) Classes assigned to each object; (b) training and validation sets.

( 3 )
Basic (B): This set includes the mean and standard deviation values computed from the B, G, R and NIR bands.Therefore, it accounts for 8 features for both single GE9 and Single GE23 and 16 features for stereo GE1.(4) Basic and band indices (B + BIs): This strategy is composed of the B's features and the three normalized difference indices (NDBI (Normalized Difference Blue Index), NDGI (Normalized Difference Green Index) and NDVI).It adds up 11 features for both single GE9 and single GE23 and 22 features for stereo GE1.(5) Basic and ratios to scene (B + Rs): This set includes the B strategy plus four ratios to scene for R, G, B and NIR.It has 12 features for both single GE9 and single GE23 and 24 features for stereo GE1.(6) Basic and nDSM (B + nDSM): It is composed of the B's features and the mean and standard (3) B: 16 features for the single tests and 32 features for stereo WV2-8.(4) B + BIs: 22 features for the single tests, i.e., using 6 band indices, such as NDBI_NIR2, NDGI_NIR2, NDVI_NIR2, NDYI_NIR2, NDREI_NIR2 and NDCI_NIR2, and so, 44 features for stereo WV2-8.(5) B + Rs: 24 features for the single tests and 48 features for stereo WV2-8.(6) B + nDSM: 18 features for the single tests and 34 features for stereo WV2-8.(7) B + Sh: 23 features for the single tests and 39 features for stereo WV2-8.(8) B + T: 26 features for the single tests, i.e., using 10 GLCMs from NIR1 and NIR2, as recommended by Pu and Landry [47], and 52 features for stereo WV2-8.(9) B + BIs + nDSM: 24 features for the single tests and 46 features for stereo WV2-8.(10) All features: 57 features are used for the single tests (i.e., single VW10-8 and single WV-22-8),including mean, standard deviation and ratio to scene for WV2 8-band and nDSM, plus Brightness_8, plus 12 band indices, plus 7 shape and geometry features and plus 10 texture features.In the case of stereo WV2-8, 103 features are considered.

Figure 8 .
Figure 8. Flowchart of the classification procedure.

Figure 9 .
Figure 9. Overall accuracy values from classification assessment by using the NN classifier.(a) Tests based on GE1 imagery; (b) tests based on WV2 imagery only using the four classical bands; and (c) tests based on WV2 imagery using all the eight MS bands.

Figure 10 .Figure 11 .
Figure 10.(a) Pan-sharpened orthoimage from GE9 focused on greenhouses; (b) DSM of greenhouses from the GE1 stereo pair; (c) DSM of greenhouses from the WV2 stereo pair; (d) pan-sharpened orthoimage from GE9 corresponding to an urban area; (e) GE1's DSM of the urban area; and (f) WV2's DSM of the urban area.

Table 2 .
Characteristics of WV2 and GE1 stereo pairs acquired at the study site.

Table 3 .
Image object features based on basic spectral information and band indices used in the classification phase.The NIR band of GE1 is expressed as NIR1.

Table 4 .
[42]e object features based on texture, shape and geometry used in the classification phase.More information about these features can be found in[42].GLCM, gray-level co-occurrence matrix.

Table 5 .
Number of features used in the classification stage for each strategy and test.SB, super basic; BI, band index; Sh, shape and geometry; Rs, ratio to scene; T, texture.

Table 6 .
PA and UA per class attained for each stereo test by using the NN or SVM classifier and the all features strategy.