Explainable Artiﬁcial Intelligence for Developing Smart Cities Solutions

: Traditional Artiﬁcial Intelligence (AI) technologies used in developing smart cities solutions, Machine Learning (ML) and recently Deep Learning (DL), rely more on utilising best representative training datasets and features engineering and less on the available domain expertise. We argue that such an approach to solution development makes the outcome of solutions less explainable, i.e., it is often not possible to explain the results of the model. There is a growing concern among policymakers in cities with this lack of explainability of AI solutions, and this is considered a major hindrance in the wider acceptability and trust in such AI-based solutions. In this work, we survey the concept of ‘explainable deep learning’ as a subset of the ‘explainable AI’ problem and propose a new solution using Semantic Web technologies, demonstrated with a smart cities ﬂood monitoring application in the context of a European Commission-funded project. Monitoring of gullies and drainage in crucial geographical areas susceptible to ﬂooding issues is an important aspect of any ﬂood monitoring solution. Typical solutions for this problem involve the use of cameras to capture images showing the a ﬀ ected areas in real-time with di ﬀ erent objects such as leaves, plastic bottles etc., and building a DL-based classiﬁer to detect such objects and classify blockages based on the presence and coverage of these objects in the images. In this work, we uniquely propose an Explainable AI solution using DL and Semantic Web technologies to build a hybrid classiﬁer. In this hybrid classiﬁer, the DL component detects object presence and coverage level and semantic rules designed with close consultation with experts carry out the classiﬁcation. By using the expert knowledge in the ﬂooding context, our hybrid classiﬁer provides the ﬂexibility on categorising the image using objects and their coverage relationships. The experimental results demonstrated with a real-world use case showed that this hybrid approach of image classiﬁcation has on average 11% improvement (F-Measure) in image classiﬁcation performance compared to DL-only classiﬁer. It also has the distinct advantage of integrating experts’ knowledge on deﬁning the decision-making rules to represent the complex circumstances and using such knowledge to explain the results.


Introduction
In 2017 there were more than 250 smart cities projects in 178 cities worldwide. The prevalence of building smart cities across the globe is largely driven by the use of Internet of Things (IoT) and Big data technologies, with smartness attributed to the use of Artificial Intelligence (AI) techniques and tools [1]. In particular, Machine Learning (ML), and more recently Deep Learning (DL) techniques into three different class labels based on the severity. However, preliminary evaluation with experts revealed that our model lacked transparency in terms of how objects were related to each other in the scenarios where it was known that drainage and gully blockages were mainly caused by the aggregation of multiple objects-and how the model was classifying drainage and gully images into different classes. The experts wanted to know the reasoning behind the classification to help them and us understand why the model was classifying an image in one of the three blockage categories (No Blockage, Partially Blocked and Fully Blocked). However, as the inherent limitations of how the CNN classifiers function, it was not possible to explain the crucial decisions made by the model. To address such 'explainability' deficit of the CNN models in this work we uniquely propose the use of Semantic Web technologies [21], in particular ontologies and reasoners [22]. Ontologies, also often referred to as knowledge graphs, allows us to capture the relationships between concepts and entities in a particular domain for better reasoning and explanation. For example, in the domain relevant to this paper, it is possible to work with drainage maintenance teams in city councils to capture the concepts of different 'objects' that are commonly found as obstructers when dealing with gully blockages, or the inter-relationship and level of their coverage leading to the different level of blockages, in ontological languages. In this paper, we demonstrate how such knowledge captured in ontologies can be used to classify an image and also to explain the result-why and how the algorithm arrived to that decision.
Considering these, an approach that combines both ML in the form of CNN and expert knowledge has been explored in this research. We propose a hybrid model that consists of machine learning and semantic rules set out to analyse the inclusion of expert knowledge in image classification. This approach can be utilised more transparently and effectively in the context of real-time applications. In this hybrid model, the semantic rules are defined to represent the context of the presence of objects in an image, whereas CNN been used to identify the object coverage from the image dataset. The use of semantic rules provides the capability to use multi-criteria decision making on any instances whereas CNN provides feature extraction in the form of objects coverage level within the image. The use of a hybrid approach for image classification can fill the gap between ML implementation and expert's knowledge implementation on monitoring a real-time instance. Section 2 of the paper surveys the literature in the area of the use of ML and Semantic web technologies in addressing challenges of building smart cities solutions. Section 3 provides context to this work and outlines the use case. Sections 4 and 5 provide details of a hybrid image classification model consisting of Semantic Web technologies and the use of DL models. The experimental design and results are discussed in Section 6. Finally, we conclude in Section 7 by summarising the advantages of our unique approach and also outline the future work.

Literature Review
In general, a smart city is defined as a city that monitors and integrates critical infrastructure and services through sensor and IoT devices [23]. The IoT devices capture data of the instances under monitoring and those data are transferred and stored for further processing. Smart Cities application integrates real-time data using IoT and the rational method to design and plan any decision making more systematically. For decision making, human experts have been the core element of all analysis and objectives [24]. Human experts often make the decision based on data coming to any application. Decision making in the context of smart cities became more challenging because of the information available and the involvement of multiple stakeholders [25].
A wide range of sensors and IoT based applications data is being stored and processed in the digital infrastructure of the cities to support decision making [26]. With the change in tools and techniques on the digital platform, a variety of data from apps and other real-time data sources are combined to analyse and make a decision to address specific issues of the city [27]. There have been qualitative and quantitative data for decision making to meet some objectives, which often appear with conflict. Therefore, all smart city applications had to operate and adapt working routines in more effective ways. Any changes must be dependent on the data being used for the application, and hence, the decision making [28].
One of the major challenges of decision making in the smart city has been how to deal with application data and how to apply human expert knowledge. Semantic Web technology in combination with data analysis has been used for expert-based recommendation systems [29,30]. The semantic techniques give the flexibility to apply human experts' knowledge and control over the prediction model. In different application-based system, the implementation of semantic representation enhanced the performance levels as the semantic rules were developed by applying high-level information, which was gathered from experts from the corresponding fields. We list several types of transparency expected in smart cites models. Each may require a different sort of explanation, requiring different measures of efficacy [18]: (1) For a company or a service provider: to understand and explain how their system works, aiming to identify the root cause of problems and see whether it is working well or not, and explain why. (2) For end-users: human users need to trust AI systems in obtaining their needs, but what should be the basis for this trust? In addition to providing end-users with knowledge on the system's prediction accuracy and other aspects of the performance, providing users with an effective explanation for the AI system's behaviour using semantic rules that are derived from the domain experts can enhance their trust in the system. (3) For society: it is important to consider the possible impact of AI in terms of increased inequality (bias) and unethical behaviours. We believe it is not acceptable to deploy an AI system which could make a negative impact on society.
Flood monitoring is one of the major concerns in most of the cities around the world. Over the years, rainfall forecast and satellite images have been used for predicting and monitoring flooding conditions. A wide range of examples also exists, primarily in the modelling of natural hazards and emergencies [31] or the wider context of flooding [32]. Methods such as Liquid level monitoring [33], the water level of gully pot monitoring [34,35] has been applied. Despite the improved access to rainfall data, water level reading, satellite imagery and improved forecasting accuracy, real-time monitoring to support decision making is still a challenge [36]. There are some limited examples of the application of semantic technology for flood risk analysis by applying an expert's knowledge [37]. However, the model had not applied any machine learning to analyse and extract features from the existing flood-related data to support risk analysis. In this example, semantics were only used as a descriptive mechanism using high-level information for recommendation system. Much research in flood monitoring involves the use of imagery available in the public domain. However, these studies are either too generic or rely on further contextual data to support classification tasks [38].
Objects analysis has been used for analysing image-based decision-making applications. In such applications objects are identified and analysed individually based on extracted information [39][40][41]. This extracted information has been used to classify the images into corresponding classes. During the object analysis, target objects were often defined and all other objects were treated as noise [42]. An image can be classified into multiple classes depending on the types of objects present within the image along with the context of the object. In multi-object scenarios, multi-class label approaches are applied to classify such images [43,44]. In those approaches, one classifier is trained for each object to be analysed the image. The classified outputs are combined with the probabilities of the individual classes to classify the image. Object analysis along with semantic representation is also used for scene recognition [45,46], where the detection of objects and semantic segmentation is applied to recognise the scene. Apart from these approaches, ontology-based approaches are also used to retrieve a specified [31] image from the library [47,48]. In the ontology-based methods, computer vision and manual annotation are used to search the specified image from the library.
Machine learning algorithms have been widely used for image classification by analysing the characteristic features of the images and organising them into corresponding categories or classes. Explainability has been applied in recent years, which allowed control and understanding of a machine learning model. Sameket et al. [8] presented why explainability had been an effective alternative for detecting weaknesses in the model. Abdullatif et al. [49] showed that mining big data is challenging, as this may depend on time. Traditional solutions do not exploit the intrinsic streaming nature of the incoming data: a continuous, possibly unlimited flow of data that evolves during the time. Stream data mining algorithms, including fuzzy methods, can incrementally adapt to the non-stationary changes and data, by creating models that adapt their structure with new data. Moreover, these techniques can track changes in the incoming data with the ability to remove the outliers from it.
Adding explainability to a model allows to verify the system, understand the weakness and detect the biases of the model. Explainability helped to understand the models' processes, how the model makes any predictions and why it sometimes it fails to make the correct predictions. The use of an explainable expert system allows extracting the detailed knowledge of the system, which gives the flexibility of modifying the model with new insights. Explainability defines the justification behind model predictions, inner workings and approaches of the model enhancement [9]. Model debugging is one of the approaches of analysing and fixing any errors within machine learning. A visualisation tool could be applied to understand the structure of the model. The visualisation could also help for understanding the data that is responsible for the prediction, and hence, clues for further improvement of the model.
Model refinement by incorporating expert knowledge through expert interaction has been applied for the improvement of the model. However, to design an explainable model, two major technical challenges need to be highlighted [19]. The first challenge was to accurately extract features from noisy and sparse data into the model since it was challenging to address the relationships between high-level and low-level features and their semantic meaning. The second challenge was to generate easy and understandable explanation from the multilevel model structure. A knowledge-base representation was applied to understand heterogeneous entities based on the embedded knowledge base [50]. The knowledge graph focused on how to represent different types of properties over heterogeneous entities.
The main goal of the explainable system is to link the human context-based reasoning with facts to construct contextual explanatory models [51]. Human understanding is applied for the intellectual understanding of the context apart from data. Humans can understand the context using very few data and bring reasoning into decision making in a particular task. In other words, the explainable system provides extensions to the existing system to apply human capabilities and understanding to make any decision or action. Explainability is used in multi-domain operations to strengthen confidence, knowledge representation and reasoning. Situational understanding is required to conclude multi-domain scenarios. However, the situational understanding depends on data coming from the machine and context considered by humans [52].
The effectiveness of the explainable system has been often restricted because the criteria are ill-defined and difficult to interpret [20]. Interpretability of the deep learning networks can be increased by applying learning via human-computer interactions and representation. Oftentimes, the explainable system is aimed to provide human-readable and interpretable explanations of decision making [53]. A twin system consists of a black-box built from the machine-learning method, and a white-box method built from human knowledge has been used for the interpretation of chronic renal disease since the black-box systems showed limited effectiveness on explaining the reason behind the predictions or classifications [54]. An adversarial approach was used to modify the explainable system, where modification of the system was based on the information of the relevant features [16]. Explanations had been provided for understanding the model's reasoning, which was consistent with expert knowledge. A hybrid human-in-the-loop approach was applied where machine learning was improved using extracted domain expert knowledge [55]. Human-in-the-loop approaches iteratively apply experts' input on machine learning with reasoning.
A combination of the logic-based approach and probabilistic machine learning approach is required to build context-adaptive systems [56]. Integration of these two approaches combines the strength of each to make an effective explainable knowledge-based system. A user-centric explainable decision support system was applied, where the system linked the human reasoning process with intelligent explainable techniques [57]. The system had scaled up the machine learning model with user understanding and observations of an event. An explainable framework was used for anomaly detection. In the framework, the deep learning model detected the anomalous event with the confidence in the description of the detected anomaly and the relevant factors, i.e., the framework had justified the decisions [58]. Deep Learning or black-box models are not suitable for cognitive cities where collaborative approaches have been used between the city and its citizens. Information and knowledge sharing between human and machines are required in cognitive cities. Explainable intelligent systems can adopt humans knowledge in such collaborative approaches [59].
Convolution Neural network (CNN) models had been applied in image classification tasks [42,60]. While building the CNN model, there had not been any single architecture for CNN development. The CNN structure had been modelled by altering network parameters such as the number and types of layers and activation function [61,62]. These parameters had been tuned iteratively to increase model accuracy. A CNN model as an image classifier takes an unclassified image as an input and classified the image into a class label based on the feature extracted. CNN is one of the approaches that has also been applied for the multi-class classification [62,63]. In an application, either one CNN model or multiple CNN models could be designed based on the task [64,65]. When multiple models were developed, each model performed a specific task, and the output of each model was combined.
CNN achieved remarkably higher accuracy on many image analysis applications. However, the network was heavily depended on the number of data that were used for the training [66]. There was a low number of training images in many application domains. Data augmentation is a commonly-used approach in many machine learning models to virtually increase the training dataset for the enhancement of the model accuracy along with for avoiding overfitting. Different augmentation techniques, such as geometric transformations, feature space augmentation, colour space augmentation and random erasing, were applied. Taylor and Nitschke [67] applied geometric transformation augmentation, which changed the shape, size and orientation of the original image during the augmentation. Image augmentation by pairing samples is another approach, where a new image is formed from an image by overlapping another image randomly selected from the dataset [68]. Auto-augmentation techniques were applied which defined probabilities to allow a function for the image augmentation in terms of translation, colour normalisation and rotation [69]. Different image augmentation techniques have been used in different applications; however, in all those applications, the common objective was to increase the training data set to enhance the model performance.
Semantic technologies have been one of the approaches to explore the real-time complex data stream, analyse it and provide higher-level of understandings in Smart City environments [70][71][72]. With the extremely complex nature of IoT data for smart cities such as the variety of representations, distribution, scales and densities, the semantic web offers a solution by providing a layer of abstraction, standardisation and semantics to solve interoperability issues [73]. Furthermore, this facilitates the fusion of a variety of IoT data with existing knowledge from standard data sources and linked data [74,75]. One of the challenges of smart cities has been integrating a large number of applications. The conventional applications might have been complex to communicate with multiple objects and heterogeneous data. Semantic technologies have been applied to integrate multiple heterogeneous IoT devices for data monitoring real-time events and reasoning to support intelligent systems [76][77][78], i.e., semantic technology has provided the flexibility to integrate heterogeneous data and functionalities into a common platform.
At the heart of semantic web applications, ontologies play a key role. In simplistic terms, ontologies are "partial, the simplified conceptualization of the world as it is assumed to exist" [79], and essentially involves the formal definition of a domain as a set of concepts and their entity types, properties and interrelationships [80]. The ontology-based semantic approach has improved the interoperability between the applications. To integrated many applications in smart cities, ontology catalogue was used [81]. Ontologies catalogue has allowed researchers to share and reuse domain knowledge. The ontology-based semantic approach has improved the interoperability between the applications. The ontological solution uses internal and external knowledge about environment, behaviour and activities [82]. One of the key motivations for applying ontologies is to ensure heterogeneous sensors can be integrated effectively during natural disasters and crises [83,84].
Several works have used semantics and ontologies within smart cities for a variety of functions, e.g., to add value to data collected from sensor and social data streams [85], to combine sensor and social data streams with machine learning techniques and to facilitate interoperability and information exchange [83]. A range of projects has explored the use of the semantic web in specific domains, such as energy and transport [28], mobility and autonomous vehicles [86], traffic jams and delay [87], surveillance system [88], emergency management [89], environment [90], parking [91], energy [92,93], water [94] and so on. Several projects and frameworks have also emerged, aiming to exploit semantics to enable interoperability, e.g., OpenIoT [95], CityPulse [96] and VITAL [97].
Over the past few years, a variety of ontologies have also been developed for use within smart cities environments for IoT, sensors, actuators and sensor observations, such as [28,[98][99][100]. Over the past many years, there has been a considerable effort in developing ontologies for defining sensor networks, with particularly the SSN (Semantic Sensor Network) ontology being one of the most commonly extended and adapted ones. The SSN ontology is a domain-independent model that covers sensors, sensing and measurements, and also incorporates other models such as SensorML and Observations and Measurements (O&M). Many studies have also extended the SSN ontology to adapt to their specific needs and domains such as environment [101], smart buildings [102], energy monitoring [103] and transport [104]. Ontologies applied within flooding to incorporate sensor data, although available, are limited [105][106][107][108] and applied in specific scenarios, as noted in a recent systematic review of flooding ontologies [109]. However, the application of deep learning and semantic web in disaster response has been limited, primarily aimed at classification and identification of disaster-related information in social media [110][111][112] or analysing remote sensing [113] and aerial imagery [114]. The use of semantic technologies in smart cities has led to discovering new opportunities such as information discovery, categorisation of events, complex event processing and reasoning for decision making, as the semantic networks provide a powerful way of transforming knowledge into machine-readable content [115].

Flood Monitoring in Smart Cities
Flood monitoring has been one of the major issues in smart cities. Drainage and gully blockages have been identified as the foremost reason for urban flooding; hence, monitoring of drainage and gullies is an important aspect of flood monitoring systems. Blockage of drainage and gullies on the streets and roads is a condition when external objects obstruct the normal flow of water. However, building and installing electronic sensors to monitor them is complex and not always feasible. Therefore, an alternative technique is required to monitor drain and gully blockages for the effective monitoring of flooding instances. Real-time capturing of drainage and gully images using a smart camera, as shown in Figure 1, and hence, analysing and classifying the image, can detect the potential flooding threat. The effectiveness of the monitoring depends on the proficiency of the image classification task. Therefore, an efficient image classification method is required for classifying drainage and gully images to identify blockage level, and hence, the flooding alert. Drainage and gullies often get blocked due to accumulation of objects at the street and roads.

Major Objects and Their Significance
To learn about the list of major objects typically causing drainage and gully blockages, a workshop with five experts, working in the domain of drainage and gully blockage and cleaning section in the Bradford Metropolitan District Council, was organised. In discussion with the experts during the workshop and after analysing more than 50 randomly selected images. Four major objects, namely "Leaves", "Mud", "Plastic Bag and Bottle" and "Water", were identified as the most common objects on monitoring drainage and gully blockage. These four major objects and their significance in drain and gully blockage is summarised as follows: i.
Leaves: Leaves were raised as one of the most prevalent problems when it comes to blockages. Once leaves enter into the drainage, they become less of a problem, as they can pass through the sewage system relatively easily. The real problem is when the leaves gather on top of a drainage system and begin to form dams if they cannot pass through, as shown in Figure 2. ii. Slit (Mud): Silt is solid, dust-like sediment that water, ice and wind transport and deposit. Silt is made up of rock and mineral particles that are larger than clay but smaller than sand, as shown in Figure 3. During the discussion, silt was discussed as a major problem for drainage and gully blockage if they were not sufficiently cleaned regularly and were allowed to build up. Furthermore, if silt accumulated for a longer period, it can be fertile enough for vegetation to grow relatively easily, which can cause further problems with the drainage system. iii. Plastic and Bottles: Plastic and bottles were identified as another major risk to drainage system due to the capability of these objects being able to cover the drainage and restrict the water flow into the sewage system, as shown in Figure 4. Further discussions revealed that bottles by themselves are not an issue, but in combination with other litter or debris, raise the risk of blockage. As discussed with experts, bottles would typically be pushed up against the entryways to the drainage and gully, leaving the access way either blocked or restricted. iv. Water: Finally, water was identified as one of the four major objects to be monitored while deciding the drainage and gully blockage. The presence of water along with other objects and their coverage, as shown in Figure 5, is the key factor in deciding the blockage level.

Convolutional Neural Network for Object Coverage Detection
In this flood monitoring application, the drainage and gully images show multiple objects. The presence of these images causes blockages of drainage and gullies, and hence, urban flooding. Detection of the objects with their coverage level within the image is crucial for detecting drainage and gully blockages. An efficient image classification method is required for classifying drainage and gully images. Considering the need for object coverage detection based on image features, CNN models have been built. The details of the CNN models are presented in Section 5.4.1.

Semantics for Flood Monitoring
In our solution, semantic techniques enable understanding the characteristics of objects and the context of these objects, with the use of explicit formal rules. Decision making on detection of drainage and gully blockage is a contextual problem, as it depends on the presence of the types of objects and their proportion. For example, the detection of water coverage in an image is not enough to classify the blockage level in a flooding context. The presence of other objects in combination with accumulation of water define the severity of drainage and gully blockage, i.e., the presence of other objects and their relationship is crucial on decision making. The object coverage levels detected by CNN models do not provide the explainability on classification. Expert knowledge adds the explainability of the system for decision making. To bring control over decision making, semantic representation and formulation of semantic rules are defined. With the knowledge elicitation process, involving interviewing experts, the relationship among objects coverage level and the context of drainage and gully blockage instances have been articulated with semantic representations. Semantics rules bring the knowledge from the domain experts to the system on decision making to classify the image into a class label. The details on semantic rules formulation are presented in Sections 5.5 and 5.6.

Hybrid Image Classification Models with Object Coverage Detectors and Semantic Rules
We propose a novel hybrid image classification model that classifies the drainage and gully images into a class label. This hybrid approach is a combination of machine learning and semantic techniques. In this hybrid approach, the machine learning algorithm is used to detect object coverage proportion within the image, whereas the semantic technique has been used to define the relationship among the objects based on the detected object coverage level. The classification process of this proposed hybrid image classifier is described with the conceptual flow diagram, as shown in Figure 6. In this hybrid image classifier, the classification method consists of three computational steps: "Object coverage detection", "Semantic representation and rule base formulation (indicated by 'Rule Base')" and "Inference Engine".

Object Coverage Detection
Drainage and gully may get blocked with litter materials such as leaves, mud, plastics and bottles. The list of individual objects causing the blockage is considerable, and hence, identifying every object's coverage is not practically feasible within object coverage detection and image classification. Therefore, major objects coverage detection is crucial to develop an efficient image classification model. Information on the presence of the objects within an image is not sufficient to apply conclusive reasoning on deciding the blockage level. To strengthen the classification decision, the level of coverages of the detected object within the image is used. Coverage detectors are built using CNNs that classify the image into one among four coverage levels namely: Zero, One, Two and Three. These coverage detectors are iteratively modelled, trained and tested with sample images.
To detect the presence of objects within an image, the object coverage detector for each object was applied. Four object coverage detectors (one for each object) are built using CNN, where each detector has been modelled, trained and tested with sample images. Each object coverage detector detects the presence of those objects within the image.

Semantic Representation and Rule Base Formulation
Semantic rules for the image classification are defined based on expert knowledge captured during the workshop. During the workshop, experts were asked to categorise the sample images into three class labels: "fully blocked", "partially blocked" or "no blockage". Experts were also asked to identify the objects in the images and provide the reasoning on classifying the image into the corresponding class labels. Five experts were used to classify single images and as a result, we observed experts classifying images into different class labels. In such cases, the majority count approach was applied for knowledge extraction, i.e., the image was classified into the class label with the majority vote.

Inferencing and Image Classification
Inferencing is applied to classify the image based on object coverage and semantic rules. Inference engine selects appropriate rules from the rule-base. The selection of rules depends on the detection of each object's coverage level. The most appropriate rule is selected to classify an image into the corresponding class label.

Methodology
In this section, preparation of training data for the image coverage detectors model, implementation details using machine learning and the use of semantic rules for image classification is presented in detail. Performance of the hybrid classification model in terms of accuracies on object coverage detection and image classification are also analysed. Moreover, the classification results are compared with the machine learning-based image classification model.

Data Construction
Preparing data sets for object detector was challenging due to the lack of public datasets on drainage and gully blockages. To overcome this, images are collected from publicly available image sources such as Google Image, YouTube videos and other public sources. These images are retrieved using multiple keywords, such as 'drainage block', 'rubbish and drain blockage' and 'drain-grate and flooding'. It was observed that all the collected images were not useful considering our problem domain, owing to some being noisy, unwanted and blurred images. To remove those unwanted, noisy and blurred images from the dataset, manual supervision was applied for dataset preparation. During the manual supervision, each image was analysed in terms of their size, quality and how closely they are related to the problem domain. After data pre-processing, the image dataset for each object type was prepared.

Image Augmentation
Convolutional Neural Network (CNN) model with higher accuracy, lower training and validation loss are required. To build an effective CNN model, a larger training dataset is required [116]. However, there has been a limitation on accessing a large number of images for model training. To achieve this, image augmentation has proven to be a powerful technique to enrich the image dataset. In other words, image augmentation has found to be an effective alternative to enhance model performance [67,117,118]. Supported by the reported improvement in model performance using image augmentation in the literature, we have chosen to use image augmentation to build an improved CNN model for object coverage detector accuracy.
The main objective of applying image augmentation is to increase the training image dataset. Different image augmentation techniques such as geometric transformation, random erasing, colour space transformations and feature space augmentation have been applied for image augmentation. "ImageDataGenerator" class from Keras library has been used as an alternative for image augmentation to improve the image classification accuracy [69,119]. Images are augmented by changing augmentation parameters such as zoom, height shift, width shift, shear and brightness by altering the values as shown in Table 1. Random values are set within the range for different parameters to increase the diversity in image dataset. By applying image augmentation, the number of training images was increased by approximately 10 times.

Image Annotation and Coverage Level
One of the major issues on object coverage detection is to find the coverage proportion of each object within the image. The object count [120,121] method has been applied as an approach to find the density or proportion of object area within an image. However, the object count approach for small objects appearing in a group has been a limiting factor for object count analysis [122]. In this work, the object count method is not viewed as a feasible option, as leaves and plastic and bottles are small in size and often appears as a group. Additionally, water and mud cannot be counted in discrete numbers. The object count method also does not appropriately address the coverage area proportion. This can be illustrated with the example of analysing coverage of leaves as shown in Figure 7a,b. In these two figures, both the images have approximately 53 % leaves coverage of leaves. However, it can be observed that there is a considerable difference in the number of leaves in each image. Furthermore, it has also been visually observed in those figures that there is no uniform pattern on image coverages. The size and shape of the objects vary from image to image as well. Therefore, object counts within the image would not be a feasible option to categorise the image into different levels. To overcome these complexities, image annotation techniques have been applied that mark the object of interest within the images. Using the annotation tool "labellmg (https://github.com/tzutalin/ labelImg)", images were manually annotated; an example of an annotated image is shown in Figure 8. During the annotation, boxes (boundaries) are created by covering the objects in the image. There are one or more than one boxes for individual objects within the image depending on the position and orientation of the objects. Mostly, each object covers some proportion of the total area within the image. However, there are overlapping boxes, mainly with the presence of water within the image. The next challenge was to categorise the annotated image into different coverage levels. An analogy of the object count method within the frame, used for density calculation, has been adopted to define the coverage level [123]. The coverage area percentage was applied to categorise the images into different levels. To find the coverage percentage of each object within the images, the area of each box has been calculated for each object based on box coordinates. The areas of individual objects are summed, and hence, the percentage of coverage of each object within the image was calculated. Based on the coverage percentage, as listed in Table 2, annotated images have been categorised into four coverage class levels. Example of coverage level corresponding coverage percentage of three images, shown in Figure 9a-c, is listed in Table 3.

Coverage Detector Implementation
Object coverage detectors have been modelled, trained and implemented to detect the coverage level of each object within the image. Coverage detector model for each object, leaves, plastic and bottle and mud and water, has been designed as a CNN model.

Convolutional Neural Network
CNN is a feed-forward neural network made up of layers with learnable parameters including weights and biases. A CNN is typically composed of multiple layers interconnected to each other. Different layers that have been applied in CNN models are summarised as below.
Convolutional layers: Convolution layer is the first layer in the CNN model and works as a feature extractor to extract features, such as edges or textures from the input image. This layer uses a convolution operation on the input image and passes the extracted output to the next layer in the model. The three-dimensional input in terms of height, width and the number of channels are applied to the convolutional layer to get the output as a feature map. In this work, the number of convolution layers was adjusted based on the model training accuracy. After several iterations and adjustment of model parameters and based on the best training accuracy, two convolution layers was set for leaves and water coverage detector CNN models, whereas three convolutional layers were set for plastic and bottles and mud coverage detector CNN models.
Pooling layers: a pooling layer is a down-sampling layer, which is commonly applied after the convolution layer and takes feature maps as its input. The main purpose of applying pooling layers is to minimise the spatial resolution of the feature maps. Max pooling was implemented to progressively reduce the spatial size of the feature maps, and hence, reduce the number of parameters and computation in the network.
Flatten Layer: a flatten layer transforms the entire pooled feature map matrix into a single column. The transformed values are then fed to the next layer in the network for further processing. Flatten layers are applied to increase the training accuracy.
Dense Layer (a fully connected layer): several convolutional pooling layers and flatten layers are applied in combination to extract features. The dense layer that appears after these layers infers the feature representations and perform the reasoning. In this work, the dense layer classifies the input image into four levels. Softmax layer has been used to calculate the final probabilities of each class.
Activation function: the activation function decides whether or not a node in the network should be activated. The activation function introduces non-linearity to the network, which analyses whether the information that a node is receiving should be stored or ignored. ReLU function was applied as the activation for the CNN models.
Four CNN models are designed by altering the number of layers. The CNN model architecture, for the leaves coverage detector, is shown in Figure 10. Other coverage detectors have differed only in terms of the number of convolution and pooling layers that were used.

Model Regularisation and Parameter Selection
One of the most common problems in CNN training is overfitting. Regularisation has been applied as one of the approaches for improvement in CNN training accuracy [124]. Regularisation modifies the model's learning parameters such that the performance of the model improves during models training. In this work, different parameters have been applied for the model regularisation: L2 (lambda): L2 is also known as weight decay. It has been applied so that the weights concentrate toward zero. Lambda regularisation parameter value is set as 0.001.
Dropout: dropout randomly removes the output of some of the hidden nodes during the training and only updated the remaining nodes. We have defined 0.25 to 0.5 as the probability of dropout during the training. After analysing different ranges, a dropout probability of 0.4 was set for the model, as it shows the best training accuracy.
Image Augmentation: High number of training image dataset has been generated using data augmentation, as discussed in Section 4.2.
Padding: padding is mainly applied to add columns and rows with zeroes to keep the spatial sizes the same after convolution. Padding has been applied such that the output has the same length as the original input.

Semantic Representation
Semantic representation of the blockage instances was created after the analysis of an individual expert's view on object identification, classification and reasoning. Structured knowledge is generated to represent the conceptual model on blockages of drainage and gully. The relationship between objects coverage level and classification instances: "Fully Blocked", "Partial Blocked" and "No Blockages" instances have been presented with semantic representation, as shown in Figure 11. The concept map reflects the top-level concept of drainage and gully blockages. This representation shows the major components of an image in a certain combination, which defines the instances of blockages. "Fully Blocked", "Partially Blocked" and "No Blockage" instances are formulated based on objects and their coverage within the image. Using the experts' reasoning in image classification into corresponding class labels, the semantic rule-base has been created. The drainage and gully blockages' instances have been converted into the OWL (Web Ontology Language) ontology as shown in Figure 12.

Rule-Based Formulation
Experts have highlighted that an image is classified as "Fully Blocked" when there are many objects that cover most of the image portion with the sign of severe restriction of water flow through the drainage system, i.e., water cannot enter the drainage system due to objects that caused a blockage. Similarly, an image is classified into the "Partially Blocked" label when the water is hindered from entering the drainage system but is still able to enter to some degree. In a drainage system with this classification, there are fewer objects that block it, or the combination of objects are not as severe. An image is classified with the "No Blockage" category when water can freely flow into the drainage system and no obstruction-causing objects are detected. Considering that the flooding condition, the experts have also highlighted that the drains and gullies do not always cope with the amount of water, but also the location and rain duration are among the parameters that define the flooding scenarios. In this hybrid image classification model, the location and rain duration parameter is ignored, since with image analysis these parameters cannot be detected. Mutually exclusive semantic rules are defined to classify the image instances based on object coverage detection using experts knowledge of image classification. SWRL (Semantic Web Rule Language) rules have been created for all the possible instances depending on the presence of object coverage and their combinations; an example of one of such rules are presented in Figure 13. We discuss the rules in more detail in Section 6.2. Figure 13. Sample of SWRL rule.

Experimental Design and Result Analysis
The simulation was performed on a machine with Intel(R) Core(TM) i7-8750 HCPU @2.20 GHz processor with 15.5 GB (usable) of RAM running on Windows-10 64-bit operating system. For CNN model training, the experimental setup for image size, the number of epochs and batch size have been set by altering their values in different iterations. Input image sizes 32, 64, 96 and 128 have been applied in different iterations, the number of epochs caries from 100 to 500, whereas the batch size varies from 8 to 32. The models with the best training and validation accuracy with the corresponding setup have been used for object coverage detection.

Object Coverage Detection Training
For the evaluation of models' training, training accuracy, training loss, validation accuracy and validation loss have been analysed iteratively for each object coverage detectors. The models with the best training accuracy, training loss, validation accuracy and validation loss, as listed in Table 4, have been used for the object coverage detection. It was observed that these training performance parameters are not the same for each object coverage detectors. In other words, there has been a variance in the performance level of object coverage detectors such as training accuracy ranges from 0.9626 to 0.9983 and validation accuracy ranges from 0.7727 to 0.8955. The training accuracy and loss and validation accuracy and loss are analysed plots over 300 iterations are presented in Figure 14a-d. These plots showed that the model's accuracies improve significantly, by up to 60 iterations, following which the accuracy and loss performance appears to stabilise. Therefore, the training of the models is limited to 300 iterations only.

Analysis of Semantic Rules Implementation
Class labels generated by the inference engine on the test images are compared with the class label defined by experts for those test images. These test images belong to all three categories with the presence of objects in many ways; some images have only one of the listed objects, whereas other images have more than one listed object. Figure 15 shows the examples of implementation of semantic rules for the image classification. The following examples illustrate how the matched rules are applied based on the object coverage detection and hence the classification decision. Analysis of rule selection is presented for three sample images of the category "Fully Blocked", which has been correctly classified as "Fully Blocked" and others two images which are incorrectly classified as "Partially Blocked" and "No Blockage". In the scenarios, when no rule is selected the default rule "No Blockage" is used for defining the class.
In the first image (Figure 15a), the object coverage detector detects Mud level three, Leaf level one, Plastic and Bottle Level zero and Water Level zero. The combination of these selects the following rule and correctly classified the image.

Hybrid Class Performance Analysis
In general, the image classifier models' accuracies have been a one-step evaluation [60,64,65,125]. The model performance is evaluated in terms of correctly classifying the test image into corresponding class labels. Those models were built using machine learning algorithms. Therefore, the models' performance was evaluated in terms of model test accuracy. In this proposed hybrid image classifier, the performance of the model was analysed in two stages. First, the accuracy of the object coverage detector is analysed, and second, the accuracy of the hybrid image classifier by applying the semantic rules is analysed.

Accuracy of the Object Coverage Detector
Accuracy of the detection of objects' coverage level within an image is crucial for the implementation of the semantic rules for image classification as the accuracy of semantic rules depends on the accuracy of objects coverage level detection. Individual object coverage detectors have been tested with 21 sample annotated images. The result is summarised with the combined confusion matrix of object coverage detectors in Table 5. From the confusion matrix, it is clear that detector models do not have uniform accuracy. The accuracies of level Zero and level Three was found to be relatively higher in comparison to the accuracies of level One and level Two. Level Zero has no or very low coverage of the representative object within the image, to generate characteristic features of that object during the model training, which could lead to higher accuracy for level Zero detection. For level Three coverage detection, there was more than 50% coverage of the representative object, which reflects the significant characteristic features representation of the object. Because of this, the modes have higher accuracy than level One and level Two during the model training. For level One and Two, coverage of representative objects is less, i.e., there are other objects or image background features that were extracted during the model training, and hence, there is a lower range of test accuracy. The analysis showed that the object detection accuracy ranged from 61.9 % to 76.2% and the aggregated accuracy of the object coverage detectors was 69.04%, as plotted in Figure 16.

Accuracy of the Hybrid Image Classifier
In the next stage of the image classifier, 26 test images are classified into corresponding class labels using semantic rules. These test images are selected such that they reflect the best match scenarios for drainage and gully blockage that can cause flooding. The classification accuracy of the proposed hybrid classifier has been compared with machine learning-based classifier based on deep learning. The performance has been evaluated on the accuracy on correctly classifying the Fully Blocked (FB) image as FB, the Partially Blocked (PB) image as PB and No Blockage (NB) images as NB, along with wrong classifications such as FB images classified as PB, PB images classified as NB and so on, as shown in Figure 17. The overall accuracy of the machine learning-based classifier and hybrid classifier has also been compared, also shown in Figure 17. Figures 18-20 present a further sensitivity analysis in terms of precision, recall and F-scores for individual instances of our classifier. We observe that the hybrid classifier outperforms the machine learning classifier.
Analysing Figure 17, it can be observed that the hybrid image classifier improved performance compared to that of machine learning-based classifier. The hybrid classifier has better accuracy on all the true positive classification, that is, classifying "Fully Blocked" as "Fully Blocked", "Partially Blocked" as "Partially Blocked" and "No Blockage" as "No Blockage". The overall accuracy of the hybrid classifier is 69.23%, which is an improvement of about 2% accuracy compared to the machine learning-based classifier. It has been reported in the literature that machine learning-based classifier accuracy depends on the number of training dataset used for CNN model training [126]. Since there have been fewer training datasets available for CNN model training, the machine learning-based model has the limitation of achieving higher accuracy. Apart from adjusting the model's training parameter, there has been no other explainable reasoning or control for model training to enhance accuracy. Sensitivity analyses in terms of precision, recall and F-score for "Fully Blocked", "Partially Blocked" and "No Blockage" have shown improved performance on the classification for each instance when a hybrid classifier has been applied. In the hybrid model, the accuracy analysis of a hybrid classifier is a two-stage accuracy analysis. The overall accuracy of the classifier depends on the accuracy of the object coverage detectors and implementation of the semantic rules. In this work, the context of the object presence and their coverage levels have been important to define the class level. Use of semantic rules provides the flexibility to apply expert knowledge for context analysis. Expert knowledge in terms of semantic rules reflects the context of the image, which helps to correctly classify an image into the corresponding class label.
In the literature, the explainable AI presents some challenges in terms of it being used as a selective decision making that focuses on explanations and background knowledge [127], a large amount of information [128] and using case-specific decision making [129]. Moreover, we have faced other challenges as well in this work which might be useful to other researchers working on utilising Explainable AI for developing Smart Cities' solutions. Availability of domain knowledge in the form of ontologies or knowledge graphs: in our case, we needed a domain-specific ontology that reflects the flood case scenarios. We had to create this ontology from scratch in our work, as the existing ontologies do not cover the case we have worked on. Access to domain experts: if no open-domain knowledge about the problem areas under investigation exists then domain knowledge in the form of ontologies have to be created with close consultation with experts. This is also often an iterative process. Skillset in knowledge engineering: knowledge engineering is a complex field, and capturing tacit knowledge is a complex task requiring specialist skillsets to build Explainable AI-based systems.

Conclusions and Future Work
In this work, we have made a case for 'Explainable AI' with a hybrid image classification model consisting of ontological representation of the domain including rules captured with the help of domain experts and a DL-based classifier. Such a hybrid model is applied in a real-world use case involving flood monitoring application in the context of a Smart Cities EU project. The hybrid model is based on machine learning and semantic technology, which classifies drainage and gully images into either of the following three blockage instances: "Fully Blocked", "Partially Blocked" or "No Blockage". We captured expert input as semantic rules and used them in our hybrid approach to identify four major objects primarily responsible for drainage and gully blockages. A set of images were classified into a classification label based on the knowledge elicitation process with experts in this project. Semantic web rule language was applied to define relationships between various objects' coverage level within the image. In this hybrid image classification task, the classification process was divided into two phases. In the first phase, we applied object overage detection, in which object detectors were modelled and trained using CNN. To build our CNN models, publicly available images were manually annotated and then used for training. The annotated images were classified into four coverage levels based on the coverage percentage of each object within the image. In the second phase, semantic rules were applied based on the coverage of the object within the image to classify the image instance. The accuracy of our proposed hybrid image classification model was analysed in two stages. The accuracy for both object coverage detectors and the hybrid classifier based on inferencing semantic rules have been analysed. The accuracy analysis demonstrated an improvement in the accuracy of the hybrid classifier in comparison to the machine-based image classifier.
The major advantage of the hybrid image classifier is that it gives the flexibility to incorporate experts' knowledge in the classification process.
In the machine learning-based classifier, the classification performance depends only on the training of the model, whereas the hybrid classifier not only shows the improvement in classification accuracy but also adopts the expert's knowledge. The analysis showed that some rules correctly predict the class label as "Partially Blocked", whereas some rules predict incorrect class labels, for example classifying a "Partially Blocked" image into a "No Blockage" image. The hybrid model transparently showed which rules perform well and which rules need to be revised.
For our future work, the accuracy of both the object coverage detectors and hybrid classifier will be enhanced. The accuracy of coverage detectors will be improved by adding a higher number of application-focused images for training. Besides these four object coverage detectors, other object coverage detectors, such as grate, can also be included. The rules will be updated by revisiting rules that have been identified as the main contributors to low accuracy. The rules will be re-evaluated and rewritten to improve the overall classification accuracy. Future work will also explore the possibility of including additional contextual information, such as weather and rainfall (higher rainfall may increase surface run-off), seasonality (autumn involves increased leaves) or the occurrence of local events (local street events may increase litter), which could further help improve our approach. Fuzzy rules can be applied for the knowledge representation and rules generations when the domain knowledge has some kind of vagueness.
Author Contributions: D.T.: conceptualisation, methodology, software, validation, formal analysis, investigation, writing-review and editing, supervision, project administration, funding acquisition. B.K.M.: conceptualisation, methodology, software, validation, formal analysis, investigation, writing-review and editing, visualisation. A.A.: conceptualisation, methodology, software, validation, formal analysis, investigation, writing-review and editing. S.S.: validation, supervision, project administration, funding acquisition. S.M.: formal analysis, investigation, writing-review and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Interreg EC grant as part of Smart Cities and Open Data REuse (SCORE) project.