Benchmarking the Applicability of Ontology in Geographic Object-Based Image Analysis

: In Geographic Object-based Image Analysis (GEOBIA), identiﬁcation of image objects is normally achieved using rule-based classiﬁcation techniques supported by appropriate domain knowledge. However, GEOBIA currently lacks a systematic method to formalise the domain knowledge required for image object identiﬁcation. Ontology provides a representation vocabulary for characterising domain-speciﬁc classes. This study proposes an ontological framework that conceptualises domain knowledge in order to support the application of rule-based classiﬁcations. The proposed ontological framework is tested with a landslide case study. The Web Ontology Language (OWL) is used to construct an ontology in the landslide domain. The segmented image objects with extracted features are incorporated into the ontology as instances. The classiﬁcation rules are written in Semantic Web Rule Language (SWRL) and executed using a semantic reasoner to assign instances to appropriate landslide classes. Machine learning techniques are used to predict new threshold values for feature attributes in the rules. Our framework is compared with published work on landslide detection where ontology was not used for the image classiﬁcation. Our results demonstrate that a classiﬁcation derived from the ontological framework accords with non-ontological methods. This study benchmarks the ontological method providing an alternative approach for image classiﬁcation in the case study of landslides.


Introduction
In remote sensing image analysis, traditional per-pixel and emerging object-based approaches exist.In a per-pixel method, single pixels are assigned to different geographic classes based on their reflectance values from different spectral bands without employing additional and potentially useful spatial, geometrical or contextual features [1].The per-pixel approaches operates at the spatial scale of the pixel.This poses the problem of mixed pixels, where a pixel represents more than one type of image objects.The use of a per-pixel approach is diminishing with the increase in resolution of satellite images.One of the reasons is the "salt and pepper" effect where single pixels are misclassified in a group of pixels representing certain class [2,3].
There has been a shift from per-pixel to object-based methods [4,5].Geographic Object-Based Image Analysis (GEOBIA) combines contiguous homogeneous pixels to segment geographical objects in remote sensing images.The grouping of a pixel is carried out using segmentation algorithms [6].After segmentation, feature values of the segmented image objects are extracted.Applying these feature values, rule sets are developed to classify image objects into classification categories.In contrast to per-pixel methods, object-based methods can be applied at multiple scales and make use of spatial, contextual, and textural features alongside spectral features for image object classification.
The GEOBIA approach relies on expert knowledge to perform image segmentation and classification [5,7].Such expert knowledge, if systematically organised, can guide future image analysis [8].However, GEOBIA methods currently lack a systematic method to conceptualise and formalise domain knowledge.With a high dependency on human experts and a lack of formal knowledge, GEOBIA processes are highly subjective and largely irreproducible [9].Image object extraction and classification can be biased by human subjectivity and vary depending on the users' capabilities and experiences [8].With knowledge formalisation, the classification process is prescribed and automated with less human intervention and using consensus knowledge approved and developed by domain experts.
Knowledge representation languages can be used to formalise a domain expert's knowledge and reduce the issues of subjectivity, automation and transferability.A knowledge representation approach can employ logic-based formalism or non-logic-based representation.In a logic-based approach, the representation language is usually a variant of first-order predicate calculus and reasoning is based on formal verification of logical consequences [10], while, in a non-logical approach, the knowledge is represented using specialised data structures and reasoning is performed by applying specialised procedures on the structures [10].A semantic network represents a non-logical approach with a specialised network structure in which a graphical network of nodes and arcs is used for knowledge representation.The graphical representation shows the semantic relations between concepts, which can be used to create and share the knowledge of thematic experts [11].Ontology, as a logic-based approach, has well-defined and formal semantics to represent knowledge.
In our work, we have used ontology as a knowledge representation language that provides a representation vocabulary specialised to a certain domain or subject.The highly cited definition of ontology from Gruber [12] states that an ontology is a "specification of a shared conceptualisation".Ontology is a shared understanding of a domain agreed by experts and intended to make domain knowledge interoperable, reusable and sareable [13].The Web Ontology Language 2 (OWL 2) [14] is a machine-readable knowledge representation language for authoring and sharing ontologies.The logical consistency of an ontology can be established using reasoning engines such as Pellet [15], Fact++ [16] and KAON2 [17].The Semantic Web Rules Language (SWRL) [18] allows the creation of conditional rules supplementing the capability of the reasoner.SWRL are executed using the semantic reasoner, which discovers new entailments and incorporates them into an existing ontology.
In this work, we attempt to develop an ontological framework where domain knowledge is formalised to assist in rule-based classification.For this, the domain ontology is constructed in OWL language and the rules are written in SWRL.Using a semantic reasoner, the instances of segmented image objects are assigned to concepts representing domain entities to perform an image object classification.
In remote sensing, classification of image objects is highly dependent on the knowledge of domain experts.However, the use of that knowledge is limited when the knowledge is not formalised because the knowledge becomes incomprehensible and unsharable [19].There is a need for a knowledge organisation and representation method to handle the inefficient and excessive dependency on expert knowledge [20].Concisely, GEOBIA needs to adopt knowledge formalisation techniques that can reduce human involvement and bring transferability to image classification.The use of ontology for formalising expert knowledge has been explored in [19].In GEOBIA, the ontological framework helps in data discovery, automatic image interpretation, data interoperability, workflow management and data publication [9].Previous studies show that the ontological framework have been developed for land cover extraction [20][21][22], ocean image classification [23], and biodiversity monitoring [24].
In this paper, we aim to answer the following research questions: • Can ontology be used to formalise domain knowledge in the manner required by GEOBIA for image classification, and • What methodological changes are required to apply the formalised knowledge as a spatial ontology in object-based image classification?
This paper is a methodological contribution in the field of geo-spatial image analysis with a focus on ontological implementation.The specific contributions of the formalised ontology in this study are: • transferability of knowledge with modularisation of highly transferable domain ontology and data dependent feature ontology; • extensibility of the knowledge base with a clear separation of knowledge construction and classification tasks; • minimisation of human intervention during image classification by developing prior rules with the consensus of experts; • use of inferencing capability in image classification with reasoner (Pellet); and • data interoperability with the use of W3C (World Wide Web Consortium) standard languages (OWL, SWRL).
The remainder of this paper is organised as follows.Section 2 outlines the case study on landslides, data and study area.In Section 3, the methodology for ontology-based image classification is proposed.In Section 4, the outcome of the experiment is presented and later discussed in Section 5.In Section 6, conclusion of this work is presented.

Case Study on Landslide Detection
Landslides are commonly occurring and can have disastrous impacts.Rapid and accurate detection of landslides can help to monitor hazards, minimise risk and support disaster management.Earth observation data can play a significant role in the early detection and analysis of landslides [25,26].For landslide detection, different remote sensing techniques such as aerial photo interpretation, stereoscopic image analysis, interferometry studies, and airborne laser scanning (ALS) can be used [27].Aerial photographs are rich in data allowing comprehensive landslides analysis but involve high cost, time and most importantly they may not be available instantly after the occurrence of a disaster.In contrast, high-resolution satellite images are an economical source of data that can provide on time information about affected landslide areas [28].Recently, LiDAR data and its wide range of derivatives including digital terrain models (DTM), shaded relief, slope, aspect, and surface roughness have been used for qualitative interpretation and quantitative statistical analysis [26].In this context, landslide detection through earth observation data is considered to be a promising application domain for image analysis.
Several studies have shown the applicability of knowledge based image analyses to detect landslides [28][29][30].However, there is a lack of formalised methodology to enable the reusability of developed knowledge.The use of a knowledge representation language for managing domain knowledge in landslide detection is very rare.HazardMatch is a system developed for creating landslide susceptibility maps using semantic technology [31].In this work, landslide experts describe the key properties of landslide susceptible areas, and the landslide relevant properties for potential areas are ranked by their similarity to those defined by experts.
In this study, landslide detection is selected as the problem domain and a case study is developed using previously published work by Martha et al. [28], where the classification was carried out using geographic object-based image analysis, but without using a formalised knowledge representation technique such as ontology to classify different landslide classes.The reason for using by Martha et al. [28] for the case study was to benchmark the proposed ontological framework.We used available remote sensing and landslide data, rule sets and classification results from [28] to evaluate an ontology-driven classification method.This benchmarking of the proposed framework tests its soundness.The data includes spectral, spatial and morphometric properties of landslides, which serve as sources for constructing a knowledge base in the landslide domain.

Landslides Class Description
In this work, five different types of landslides are identified based on their spectral, contextual and morphometric characteristics.The landslide types and their description with feature criteria are sourced from [28] and provided in Table 1.In this table, the first column is landslide class; the second column is a description in the natural language; and the third column is the translation of these descriptions into measurable thresholds.For example, shallow translational rock slides are relatively narrow and elongated in shape, which means they are asymmetrical in shape.Thus, we can say that 'shallow translational rock slide' has a high asymmetry value and this is translated to the threshold of measurable features using hasAsymmetry (ě0.5).

Data
IRS (Indian Remote-Sensing Satellite)-P6 (also known as ResourceSat-1) is an earth observation mission within the IRS series of ISRO (Indian Space Research Organisation).Multi-spectral image data acquired on 16 April 2004 was used to calculate spectral characteristics used in defining landslide classification rules.The ResourceSat-1 is equipped with a high resolution Linear Imaging Self Scanner (LISS-IV) operating in three spectral bands as shown in Table 2.A Digital Elevation Model (DEM) created from 2.5 m resolution stereoscopic Cartosat-1 data acquired on 6 April 2006 was used to extract morphometric derivatives.The cartosat-1 optical satellite was launched by ISRO and is equipped with two panchromatic sensors, Pan-Aft and Pan-Fore with ´5°and 26°view angles, respectively.The satellite captures in-orbit stereo images simultaneously that can be used to generate a DEM.
In this study, the pre-processed satellite image data and DEM with its derivatives were sourced from [28].The satellite image from ResourceSat-1 was orthorectified using the 10 m DEM created from stereoscopic Cartosat-1 data.The spectral features such as NDVI (Normalized Difference Vegetation Index), brightness from optical data and DEM derivatives such as slope, terrain, curvature, and hill shade were calculated to characterise landslides and false positives.

Study Area
The study area covers a landslide prone area of about 28 km 2 bounded between 30°31 1 30 2 N and 30°34 1 30 2 N latitude and 79°6 1 0 2 E and 79°9 1 0 2 E longitude as depicted in Figure 1.It is located on the Madhyamaheshwar sub-catchment of Okhimath in the Uttarkhand state of India and situated at the confluence of the Mandakini and Madyamaheshwar rivers.The average elevation of Okimath is 1300 m ranging from 1047 m to 2620 m.The region experiences a subtropical temperate climate with annual rainfall between 1200 mm to 1500 mm.This study area has a high potential for rainfall and earthquake-induced landslides with past landslide occurrences [32,33].

Methodology
Based on the principle of modularity, the proposed classification framework is broadly categorised into five different modules:

Validation
Accuracy Assessment

Segmentation and Feature Extraction
In GEOBIA, segmentation is the first essential step where an image is partitioned into objects that can be characterised and classified.Several image segmentation techniques exist and the proposed framework permits the use of any segmentation techniques because of its modular design.The segmentation process results in the creation of image objects featuring different characteristic attributes such as spectral, geometrical, or contextual.Here, a significant improvement over a per-pixel approach can be realised, where an image object can feature spectral information in terms of mean, standard deviation, minimum and maximum values.The segmented object as a collection of pixels can have geometric attributes such as shape and size.Additionally, partitioned objects are contextually related by spatial relationships such as distanceTo, hasBorderWith, etc.
Feature extraction is the next step in which different feature values are calculated for each of the image object segments.In our workflow, the calculated feature values are exported from the module 1 (See Figure 2) in CSV (Comma Separated Values) format.The output of this module is segmented image objects with feature values.
In this case study, two different segmentation techniques were implemented using eCognition Developer Version 9.0.2 (Arnulfstrasse, Munich, Germany), namely, multiresolution [6] and chessboard segmentation [34].Initially, a multiresolution segmentation was carried out based on multi-spectral (green, red, near-infra-red) data to extract image objects.The segmentation was performed using a scale parameter of 10 and with shape and compactness parameters of 0.1 (shape + colour = 1) and 0.5 (compactness + smoothness = 1), respectively.Later, chessboard segmentation was used to further split the classified landslides into square objects with the size parameter of 2 pixels to eliminate small patches of vegetation or barren land from larger landslide objects.Finally, multiresolution segmentation based on terrain curvature data instead of spectral data with a scale parameter of 10, shape parameter of 0.1 and compactness parameter of 0.5 was used on those image segments classified as rock slides.This curvature based segmentation highlights variations in concavity and convexity features, which is used to classify rotational and translational rock slides [28].
In general, there is a semantic connection between image objects, segmentation parameters and data used.Thus, it is not just the classification that is driven by the ontology, it is the size and the shape of the segments as well (e.g., when we imagine a landslide of a particular class, we define it in terms of shape and size as well).The size and shape of the segments are outcomes of the chosen segmentation parameters as described above.As this work is focussed on the classification problem, the segmentation approach used in the previous work [28] is adopted and the segmentation parameters are not triggered by any rules based on ontology.

Developing an Ontology
Ontology development is essentially a process of identifying concepts and the relationships between concepts within a domain of knowledge.In module 2 (See Figure 2), we defined domain concepts, taxonomical structure, and properties to formally represent domain knowledge from experts.Protégé 5.2.0, an open source tool from Stanford University, was used to construct this ontology [35].
Studer [36] made a refinement to Gruber's [12] definition describing ontology as a "formal, explicit specification of a shared conceptualization".In elaboration of this definition, conceptualisation is an abstract model of world phenomenon identified by the concepts; shared means an ontology is a consensual knowledge agreed by experts; explicit means the concept types and the constraints on their use are explicitly defined; and formal means that the ontology should be machine-readable.In this study, we have used OWL as a knowledge representation language for ontological formalism.
In our work, we followed the generalised steps proposed in [37].The steps for ontology-development method were developed for frame-based formalism, whereas we use description logic-based formalism (e.g., OWL) in our work.Taking the differences between two formalisms into account [38], no facets and slots are created in steps 5 and 6 described below.
The following seven steps were conceived in developing the landslide ontology: Step 1 Determine the domain and scope of the ontology The formal representation of a landslide was determined as a domain with scope limited to slope movement and material type. Step

Consider reusing existing ontologies
To reuse ontology, we searched for a relevant ontology with concepts that can define a landslide domain.No formalised ontology existed but Varnes' landslides classification [39] shown in Table 3 does provide a taxonomy of landslides.
Step 3 Enumerate important terms in the ontology Ontology uses terms to define the concepts and relationships that describes and represents the domain area.In addition, an identification of key terminology used in our domain of interest is a vital step.Table 1 defines different types of landslides.From these definitions, firstly we identify different terms for landslide types.Furthermore, we can extract terms as shallow depth, elongated shape, rocky land, soil, moderate slope, etc. Step 4 Define the classes and the class hierarchy The terms created in the last steps provide the basis for creating new classes that describes domain concepts.A class hierarchy can be created using top-down, bottom-up or a combination of both the approaches.The structure of the hierarchy depends on the usage and scope of ontology.Varnes' landslides classification [39] is followed when constructing the class hierarchy.
Step 5 Define the properties of classes Within the feature extraction process, different attributes based on spectral, spatial, geometrical and morphological features characterising geographical objects in satellite images are extracted.These attributes are represented as datatype properties in the ontology, and link individuals (image segments) to their data values.In OWL, a datatype property is defined as an instance of the built-in OWL class owl:topDataProperty [40].The datatype properties created are shown in Figure 3.
owl:topDataProperty hasAsymmetry hasGeom hasLength hasMeanCurvature hasMeanProfCurvature hasMeanSlope hasRelBorderToNonRockyAgr Step 6 Define the restrictions associated with a class directly In this step, we defined restrictions like domain and range of the concepts.For example, the value of property hasMeanSlope is restricted to be of datatype double.
Step 7 Create instances The instances are created from the output of the segmentation and feature extraction module.The segmented image objects with all low level features were treated as instances to be categorised into respective landslides classes.

Modularisation of Ontology
In our work, the modularisation of ontology resulted in two modules, namely a domain and a feature ontology.Modularisation enables ontology transferability and avoids the reuse of a whole domain ontology when only a fragment of it is needed.With ontology modularisation, only relevant concepts and relations are used in an ontology being modelled [41].
• Domain Ontology Domain Ontology defines concepts that describe entities found in that particular domain.Different types of landslide concepts represent a domain ontology.In a domain ontology, each class and their sub-class relationship are represented with subset notation (Ď) as below.For instance, the relationship (1) describes that engineering_soils and bedrock are the subsets of material_type.Similarly, relationships (2)-( 4) show the sub-classes of classes landslides, rock_slide and translational_rock_slide, respectively.In Figure 4a, the hierarchical relationships of domain ontology are presented in graphical form: rotational_rock_slide, translational_rock_slide Ď rock_slide, • Feature Ontology Feature ontology defines concepts that describe the characteristics of different attributes identified during the feature extraction process based on spectral, spatial, geometrical and morphological features.Feature classes and their sub-class relationship are represented below.The subsets of classes features, curvature, length, slope and shape is shown in relationships ( 5)-( 9), respectively.The feature ontology graph with hierarchical relations is presented in Figure 4b: curvature, length, slope, shape Ď f eatures, concave_upward, planar Ď curvature, large_in_length, low_in_length Ď length, moderate_slope, steep_slope Ď slope, high_asymmetry, low_asymmetry Ď shape.
Next, we developed rules using SWRL to determine which concept instances (in our case, image segments) belong to which domain and feature classes.The rules for assigning domain classes depend on the feature classes.This means that we need to assign feature classes prior to assigning domain classes.For associating instances with feature classes, we used feature values extracted from the image segments (e.g., the length feature can be determined by measuring the length of the image segment).In this process, we have categorised ontological rules into generalised and localised rules.

Ontological Rules
Modularisation of ontological rules is performed to separate domain and feature classes identification rules.The aim of this modularisation is to isolate generic rules from localised rules that change with datasets provided.

• Generalised Rules
These rules involve the use of classes from the feature ontology to identify classes from the domain ontology.We have coined this as generalised rules in the sense that these rules are domain specific but are applicable to any datasets from that domain.To develop such rules, expert domain knowledge must be extracted from domain experts or the literature.In our case, we have extracted these rules from the definition of the landslide classes (Table 1).The definition of a debris slide states that they are found in thickly covered soil of moderate slope and low length.The rules extracted based on this definition are shown as SWRL rule (Rule (10) • Localised Rules These are rules that define the threshold values for determining which instances belong to their respective feature classes.The feature ontology shows that feature concept curvature is further sub classified into concave_upward and planar.To identify these feature concepts, we have developed rules based on the different variables available with the datasets.For instance, we can identify whether an image segment instance is engineering soils or bedrock using (Rules (11) and ( 12)): hasRelBorderToNonRockyAgrp?x, ?yq ˆswrlb : greaterThanOrEqualp?y, "0.5"ˆˆxsd : doubleq hasRelBorderToNonRockyAgrp?x, ?yq ˆswrlb : lessThanp?y,"0.5"ˆˆxsd : doubleq Ñ bedrockp?xq. ( The instances with the value of hasRelBorderToNonRockyAgr greater than or equal to 0.5 will determine it as engineering soils or as bedrock.Such rules are termed as localised rules because their attribute threshold value changes with the change in datasets.This modularisation of generalised and localised rules benefits the transferability of the rules.The generalised rules are completely transferable while localised rules are transferable but may require an adaptation to new threshold values.This is the challenging task for rules based image classification in identifying threshold values of the attributes used to identify feature concepts.To deal with this issue, we employed machine learning (ML) techniques.

Extracting Threshold Value
The rules that can classify instances of segmented image objects into respective landslide classes on the basis of their parameter values are extracted in this module.For each parameter, a threshold corresponding to each feature class (e.g., high_in_length, high_asymmetry) is determined.
For this task, an ML technique is used, as it is difficult for operators to extract rules from data.Moreover, ML techniques are routined and structured than human-crafted rules.This also reduces human involvement in the whole classification.The overall process of rules extraction is depicted in Figure 5.

Implementation of Random Forest Method
With this aim, we employed Random Forest (RF) as a ML method for extracting rule based knowledge.Random forest (RF) [42] is an ensemble learning method that constructs multiple decision trees using bootstrapping for classification.The outputs of all trees are aggregated via plurality voting in order to classify a new input.
In this work, we used the R package 'randomForest' that implements Breiman's random forest algorithm.The necessary training data were created using the classification rulesets from [28] in eCognition.The predictor variables used in the RF model are asymmetry, length, curvature, profile curvature, slope, and relBorderToNonRockyAgr.
The issue in using RF is that the ease of interpretation is lost as compared to a single decision tree model.The rules extracted from RF tree ensembles are high in number.Hence, RF cannot be used for threshold value extraction of attributes.To resolve this issue, we used the inTrees (interpretable trees) framework [43] that can extract reduced sets of rules.

Rules Simplification Using inTrees Framework
Random forest is a good predictive model, but it does behave as a black box.The capacity to interpret a single tree is lost with such an ensemble algorithm where hundreds of decision trees are created.The inTrees framework assists in model interpretation and gives reduced rulesets for model prediction that matches with RF.
The inTrees framework is used to extract an interpretable and reduced set of rules from the RF model.This framework takes a tree ensemble as an input.The rules governing the splits in each of the ensembles are extracted.The extracted rules are measured and ranked based on their length, frequency, and error metrics.Next, the irrelevant and redundant rules are pruned.Finally, the pruned rules are simplified into a set of if/then rules [43].The R implementation of this framework 'inTrees' package is used in this work.

Feature Attribute Threshold Value Extraction from Rules
The threshold value for each attribute were extracted from the simplified rules pruned from RF results.Such rules extracted from classification trees are used to identify concepts for automatic ontology building [44].In our work, we have developed an algorithm using a similar approach to determine the threshold values for pre-defined feature concepts.The algorithm used for threshold value extraction is shown in Algorithm 1.

Algorithm 1 : Threshold value extraction algorithm
Input: Pruned rules from inTrees Output: Threshold values for each attributes 1: Identify attributes used in the rules 2: Identify all the span values (SV 1 ,SV 2 , SV 3 , ... SV n ) for each attributes 3: for i " 1 to n do 4: for j " 1 to n do 5: if SVi ‰ SVj and SVi Ď SVj then SVi is a subspan of SVj end for 9: end for 10: Select span hierarchy level based on feature attribute level 11: Map threshold value with feature sub classes 12: return thresholdvalues At first, the values of the attributes (Ex.length, shape, curvature) used in the rules were identified.For each attribute, their span values were determined as shown in Table 4. Next, each span was checked to determine if it was contained by other spans, in order to determine the span hierarchy as shown in Table 5.From the ontological knowledge, we know that feature class curvature has two subclasses (concave_upward and planar), so we will now select two spans from the hierarchy.r´8, ´0.99s p´0.99, 8s Finally, we identify the threshold values for two curvature subclasses, as illustrated in Table 6.

Ontology Based Classification
This module is the ontology-driven rule-based image classifier.The ontological rules were written using SWRL.In SWRL, a rule consists of an antecedent and a consequent, each of which is composed of a set of atoms.Atoms can be of the form C(x), P(x,y), sameAs(x,y) differentFrom(x,y), or builtIn(r,x,...), where C is an OWL description or data range, P is an OWL property, r is a built-in relation, and x and y are either variables, OWL individuals or OWL data values, as appropriate [18].Both antecedent and consequent are conjunctions of atoms written as a 1 ˆa2 ... ˆan .The variables are indicated using the standard convention of prefixing them with a question mark (e.g., ?x).SWRL allows the use of the built-ins greaterThanOrEqual and lessThan whose role is to restrict the numeric value of the variables accordingly.For instance, we have rule to identify 'debris_slide' as: In the above Syntax (13), a rule asserts that the composition of 'landslides' individuals that are instances of 'engineering_soils', 'moderate_slope' and 'low_in_length' are the 'debris_slide' individuals.The rule set for rest of the landslides classes namely 'debris_flow' (Syntax ( 14)), 'rotational_rock_slide' (Syntax ( 16)), 'translational_rock_slide' (Syntax (17)), and 'shallow_translational_rock_slide' (Syntax (18) These SWRL based rules were executed using a Pellet reasoner [15] to classify the landslides instead of eCognition.In [21], eCognition was a classification platform and hence a simple XSLT (Extensible Stylesheet Language Transformations) was used to translate from OWL into eCognition class descriptions.However, in this study, a Pellet reasoner was used for classification platform independent from eCognition.Pellet is an OWL-DL (Web Ontology Language-Description Logic) reasoner that supports reasoning with individuals.This feature was required to associate every image segment instance to different domain classes.This way, we classified every image segment to respective landslide classes.The classification results were exported into CSV (Comma Separated Values) and Shapefile format.The classification result also serves as high-level input for segmentation and this process is termed classification-based segmentation in GEOBIA [7].For classification-based segmentation, the classified shapefile was used as input for re-segmentation.

Validation
Validation is performed by comparing the classification result with ground truth data.The accuracy of classification results is assessed using a standard error matrix (confusion matrix) to calculate Total Accuracy (Equation ( 19)), Random Accuracy (Equation ( 20)) and Kappa Statistic (Equation ( 21)) [45].User and producer accuracy are calculated to evaluate omission and commission errors for each class.To evaluate these metrics, True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN) are calculated by validating the classification results against the ground truth data.In our study, we used the kappa statistic because it is widely employed by the remote sensing community as a standard measure for accuracy assessment, although we acknowledge criticisms of this technique [46,47].The validation process was carried out using ENVI image analysis software Version 5.2 (Boulder, Colorado, USA) [48].

Knowledge Extraction Results
The Random Forest analysis resulted in a number of rules that were pruned and reduced using inTrees Framework.The reduced set of rules is shown in Table 7.The rules from Table 7 are used as an input to Algorithm 1 to extract the threshold values of the feature variables.As explained in Section 3.3.3,for each feature variable, the span values are determined and threshold values are calculated.For instance, length has a two value range (length ď 1053.04 and length > 1053.04).These serve as the threshold values for the feature classes 'low_in_length' and 'large_in_length', respectively.We apply Algorithm 1 to extract threshold values of other variables as shown in Table 8.The original values derived from [28] and the extracted values from the ML rules were very similar.Their close correspondence demonstrates how we can automate the extraction of threshold values from data sets using ML techniques.After determining the threshold values, the classification rules are written in SWRL, as shown in Rules 22 and 23 for feature class 'length'.Similarly, using these newly extracted values, the localised rules for all the feature classes were created to perform the ontology-based classification: hasLengthp?x, ?yq ˆswrlb : lessThanOrEqualp?y, "1053.04"ˆˆxsd: doubleq hasLengthp?x, ?yq ˆswrlb : greaterThanp?y,"1053.04"ˆˆxsd: doubleq

. Classification Results
Figure 6 shows ground truth polygons and classification results from both the non-ontological and the ontological methods.Figure 6a is the result from the non-ontological method and refers to the output of a rule-based classification using eCognition software.In the ontological method, an assignment of image segment instance to their respective classes is performed by reasoning over the SWRL rules as depicted in Figure 6b. Figure 6c is the reference landslide inventory prepared by [28] using landslide inventory data from [33,49].The landslide polygons are drawn manually with a stereoscopic analysis of satellite data verified with detailed field investigation.
Table 9 shows the number of occurrences of landslide classes and area covered by each classified landslide under the ontological and non-ontological methods.The results show that the classification methods have the same number of classified landslides and area coverage.The number of different landslides and area coverage is compared with the ground truth based inventory (Figure 7).This graph shows no match between the ontological method and the landslide inventory ground truth data.Thus, to further assess the accuracy and validate the classification result, a confusion matrix is generated.

Accuracy Assessment Results
The performance of the ontology-based classification results is measured with a confusion matrix in Table 10.The overall accuracy of 86.3% and kappa statistic value of 0.79 is achieved with both the ontological and non-ontological method.
To visualise the inter-relationships between different landslide classes, a chord diagram is developed (Figure 8).In this diagram, each landslide class segment is represented by a different colour (e.g., 'Translational rock slide' with yellow colour).The size of the colour band represents the proportion of classified pixels.'Debris flow' has the highest value with 8793 pixels, and 'shallow translational rock slide' has the least value with 363 pixels.The length of the arc of each class represents the total number of classified pixels for that landslide class.The total number of classified pixels are placed outside of the circle (e.g., 5979 for 'translational rock slide').The strip that remains within its own segment represents true positives, whose value is shown inside the circle (e.g., 5502 for 'Translational rock slide').Strips going to different classes represent the number of pixels of the objects that are false positives.The strip going from 'Translational rock slide' to 'Rotational rock slide' is 477 pixels and vice versa is 1699 pixels.This explains what portion of each landslide is misclassified as another class.None of the pixels were classified as 'shallow translational rock slide' giving 0% producer accuracy.The 0% producer accuracy of 'shallow translation rock slide' arises from the failure of the segmentation algorithm to delineate the linear features.The 'shallow translational rock slide' class features are linear and the generated segments were not linear.All 666 pixels classified as 'debris slide' are correctly classified resulting 100% producer accuracy.In the case of 'debris flow', 8596 out of 8793 were classified as 'debris flow' and rest 197 were misclassified as 'debris slide'.Similarly, in the 'rotational rock slide', out of 4179 pixels, only 2480 pixels were classified correctly and 1699 were classified as 'translational rock slide'.Chord diagram shows that 5502 pixels were classified as 'translational rock slide' and remaining 477 out of 5979 pixels are classified as 'rotational rock slide'.

Ontological Approach
The adoption of an ontological approach in image classification considers construction of the knowledge base using ontology and defining rules as its basis for image object identification.These rules are developed using high-level information derived from domain experts and low-level feature value extracted from remote sensing data.The ontology-based methodology has both strengths and limitations, which are discussed below.
The use of a knowledge representation language such as OWL for constructing formalised domain knowledge makes it more shareable and extensible [21].The modular approach treats the knowledge building process as a separate module from the image analysis process.This benefits domain experts, who are able to augment new knowledge at any moment without interrupting the classification process.Furthermore, a bias-free knowledge developed through a consensus of domain experts will be used in image interpretation.Incorporating human expert knowledge as ontology decreases human intervention at the time of image classification process.This helps GEOBIA to evolve towards becoming a more automated process.
The ontology-based framework supports and improves spatial data interoperability.The use of open standard formats, such as OWL and SWRL, in this framework bring syntactic and semantic interoperability.Hence, interoperability assists in transferability of knowledge, rules or results.In GEOBIA, the use of ontology also assists in data analysis, in addition to data discovery, data integration, and data publication [9].The knowledge base created in landslides can be shared across different disciplines.In this case study, the rule sets developed to identify landslides for the Okhimath area can be transferred and reused for different region datasets with slight modifications.
The reuse of only a certain part of an existing ontology is possible with modularisation, which reduces the overhead of loading the whole of an ontology when only a part of it is needed.Ontology modularisation also assists with tackling the limitation of degrading performance of reasoners with an increase in the size of an ontology.The inferencing capability performed automatic classification of image segments by providing the SWRL rules to the reasoner.One of the issue to be considered when using inference based classification is that it may take a significant time when complex SWRL rules are used.
The lack of spatial analysis capability in the ontology based classification module leads to a requirement for more interaction with image analysis tools.This can bring complexity to the overall process of classification.In the case study presented, we had to use eCognition software to calculate area and distance to class for newly classified image objects that are used for classifying other classes.The use of custom spatial built-in in SWRL rules can reduce such complexity.Remote sensing or domain experts involved in the image analysis process might have far less or no knowledge of ontology engineering.This methodology requires the users to possess an understanding of ontology and knowledge engineering.

Modularity
The concept of modularity has been introduced in technological and organisation design for tackling complexity [50].Modularity refers to the subdivision of a system into smaller parts called modules.The key aspect of modular programming is that we can reuse the modules at different stages.In this work, modularisation can be seen at two stages-firstly, the modularity in framework allowing division of steps into independent modules; secondly, the modularisation of developed ontology and rules.
A modular approach has been adopted in GEOBIA frameworks [51][52][53].Modularity ensures a framework to become more customisable and expandable.For instance, with modularisation of segmentation and classification as independent modules, we have an ability to use two different tools for each module.This demonstrates the flexibility and extensibility of the proposed framework over proprietary software with functionalities limited within their software.Using a modular approach, Ref. [51] introduced system extension capabilities in their framework to incorporate third party functionalities.This opens the possibility for researchers to integrate different tools or build new algorithms on top of the existing framework.
In GEOBIA, complexity increases with the iterative process and composite workflow [54].The modular approach follows the rule of divide and conquer to break down a complex task into a number of simpler tasks.This allows error assessment of output results at the end of each module, which becomes input to the next module in a work chain.
With modularity, the ontology construction process is isolated from image analysis tasks, allowing domain experts to create the knowledge base independently.A group of domain experts can work together collaboratively in the knowledge construction process.Thus, knowledge-based image analysis in GEOBIA becomes less subject to the expertise of a particular analyst.Ontology modularisation further assists in tackling transferability issues.The transferable domain ontology is developed separately from data dependent feature ontology.The data dependent ontology or rules needs adaptation to make them transferable.

Classification Results
We performed two rule-based classifications using the same set of rules: firstly, the ontological method proposed in this paper and later the non-ontological method from the published literature [28].In the experiment, the same segmentation technique was used for both cases resulting in the same number and shapes of segmented images.The classification result of the ontological method was found to be consistent with a non-ontological method in terms of classified object counts, thus benchmarking the performance of the proposed method.
This shows that an ontological approach contributes a complementary classification method in GEOBIA but with added benefits provided by knowledge formalisation.The advantages of the ontological method over non-ontological methods are data interoperability, knowledge transferability, semantic inferencing and more automation with less human intervention.
The numbers of classified objects and shapes did not match when compared with the reference landslide inventory based on ground truth.Thus, a confusion matrix was calculated to assess the accuracy of the image classification by comparing classified image segments with reference landslide inventory.An overall accuracy of 86.3% was achieved.The discrepancy between the classification result and reference data is dependent on the segmentation result and the threshold values for different features defined in the rules.This demands further study in improving image segmentation, which was not considered in this study as our work is primarily focussed on finding applicability of ontological methodology in GEOBIA.

Limitations
The proposed ontological framework may suffer from computational inefficiency due to its dependency on the capabilities of the reasoner.With an increase in the number of instances, classes, relations and axiom, the reasoning time may significantly increase.To tackle this issue in future work, we will further explore the use of OWL 2 profiles that trade expressive power for the efficiency of reasoning.
The accuracy of the rule-based classification is influenced by the scale-level of image segmentation.Different objects are identified at different scale levels depending on their spatial and thematic characteristics.This means that, if the classification is carried out with the segmented image objects at an inappropriate scale-level, there will be inaccuracy in the classification result.However, this is an open issue in GEOBIA, which warrants further explorations and studies [55].
With machine learning, there is a need for good training data to achieve better prediction.In the absence of adequate training data, the threshold value extracted might depart from the actual value.

Conclusions
This study proposes a framework for object-based image analysis using ontology and applies the framework to landslide detection.The framework requires construction of an ontology for a domain of interest.In the case study reported here, the ontology is based on knowledge provided in previously published work that used GEOBIA.The use of ontology allows inference on domain knowledge to bring semantic image analysis into GEOBIA.GEOBIA requires human intervention in the form of expert knowledge, for defining classification rules and the threshold values for attributes used in those rules.To tackle this intervention, we combined machine learning into an ontological framework for automatic extraction of threshold values used in the rule-based classification of GEOBIA.Modularisation of ontology is introduced to separate transferable domain ontology and non-transferable feature ontology.This study captures high level domain knowledge from experts and low-level knowledge from data using machine learning techniques.We formalised domain expert knowledge in the specific field of landslides and benchmarked by comparing with published work.The scope of this study is landslide detection, which has not previously been studied using an ontological framework.
This study helps to progress the application of ontological methods within GEOBIA.It demonstrates a novel approach to automatically extracting threshold values for feature attributes used in ontological classification rules.The developed approach distinguishes between transferable and non-transferable ontologies.In addition, it demonstrates the application of these methods to a new domain landslide detection.
An avenue for further research is incorporation of spatial rules and exploration of optimal combinations of segmentation, classification-based-segmentation, and, in turn, the final classification in an ontology driven GEOBIA framework.

Figure 1 .
Figure 1.Study area located on the Madhyamaheshwar sub-catchment of Okhimath in the Uttarkhand state of India.

Figure 3 .
Figure 3. Datatype properties to represent extracted attributes.

Figure 5 .
Figure 5. Workflow process for rules threshold extraction.

Figure 6 .
Figure 6.Landslide classification (a) results from the non-ontological method; (b) results from the ontological method; (c) ground truth.

Figure 7 .
Figure 7.Comparison between ground truth inventory and classification results from the ontological method.

Figure 8 .
Figure 8. Confusion matrix result represented in chord diagram to display the inter-relationships between different landslide classes.

Table 4 .
Determining span values of attributes used in rules.

Table 5 .
Span hierarchy for curvature attribute.

Table 6 .
Identification of threshold values.

Table 7 .
Pruned rules from random forest using framework.

Table 8 .
Extracted threshold values from rules.

Table 9 .
Number and area coverage of classified landslides classes.