Fuzzy Based Prediction Model for Air Quality Monitoring for Kampala City in East Africa

: The quality of air affects lives and the environment at large. Poor air quality has claimed many lives and distorted the environment across the globe, and much more severely in African countries where air quality monitoring systems are scarce or even do not exist. Here in Africa, dirty air is brought about by the growth in industrialization, urbanization, ﬂights, and road trafﬁc. Air pollution remains such a silent killer, especially in Africa, and if not dealt with, it will continue to lead to health issues, such as heart conditions, stroke, and chronic respiratory organ unwellness, which later result in death. In this paper, the Kampala Air Quality Index prediction model based on the fuzzy logic inference system was designed to determine the air quality for Kampala city, according to the air pollutant concentrations (nitrogen dioxide, sulphur dioxide and ﬁne particulate matter 2.5). It is observed that fuzzy logic algorithms are capable of determining the air quality index and therefore, can be used to predict and estimate the air quality index in real time, based on the given air pollutant concentrations. Hence, this can reduce the effects of air pollution on both humans and the environment.


Introduction
Various cities that are industrialized and developing across the globe greatly suffer from contaminated air for the greater part of the year [1]. Dirty air has great impact globally and locally; if not paid attention to, it can become a threat to all living things, especially to humans. From the authors of [2][3][4], many countries in Africa have attained decades of industrialization and development but without a proper plan to handle the air pollution problem.
Many developed countries have tried to make use of current booming technologies to determine strategies and methods to improve air quality and, in the long run, mitigate the issue of air pollution but most countries in Africa still lack air pollution monitoring systems and do not have management strategies in place [5][6][7].
There is a great need to pay attention to air pollution concerns in Africa where urbanization and industrialization continue to increase with the increasing population density. This paper focuses on the geographical area of Kampala City in Uganda East African region where the population numbers keep increasing higher every year. Moreover, industries are being opened and also a big number of second-hand vehicles and motorcycles, which are the major sources of transport, enter into the city at higher rates annually [8].
According to the Air Visual's World Air Quality Report, Kampala city is among the cities with the most polluted air in Africa [9]. Air pollution has a great effect on health, alone causing millions of hospitalizations every year.
Major sources of air pollution in Kampala city include dust from unpaved roads and open burning of waste by individuals as a way of managing uncollected waste, which introduces dangerously large amounts of pollutants into the air via combustion. There are also massive amounts of dirty air coming from many factories and power plants around the Kampala area [10,11].
Another big contributing source of air pollution is vehicular emissions coming from many imported second-hand vehicles; the Uganda National Environment Management Authority (NEMA) estimates that more than 140,000 litres of fuel are burnt by idling cars every day in Kampala city because of ever-growing road traffic [12].
A report compiled by Global Burden of Disease project indicates that exposure to dirty air is the fifth highest ranking risk issue for death, answerable for 4.2 million deaths from heart condition and stroke, carcinoma, chronic respiratory organ unwellness, and metabolic process infections; conjointly, an extra 254,000 deaths were owing to exposure to gas and its impact on chronic respiratory organ disease around the world. In Africa alone, there was an estimated one million deaths [13].
Therefore, it's a very important health factor to measure the quality of air. Information on air quality concentrations in a particular region and its health effects is usually presented via the Air Quality Index (AQI). The AQI presents air quality concentrations in a more understandable form to the public. It is a public information tool designed to help individuals in a particular society understand the effects of air quality on both health and the environment; it is a generalized way of describing the quality of air around the universe.
In this paper, the AQI is based on Kampala standards for air quality, which follow the Environmental Protection Agency (EPA) standards [14]. The values of the Kampala Air Quality Index are divided into six groups: good, moderate, unhealthy for sensitive groups, unhealthy, very unhealthy, and hazardous. Different colors are assigned to each group. Thus, in this paper, the AQI prediction model for Kampala city is designed using the fuzzy logic inference system. In predicting the quality of air, traditional methods, such as clustering analysis, regression analysis and variance analysis, have been greatly used but these methods do not give the desired measurements in predicting the air quality, due to the non-linear relationship between pollutant datasets. The methods that were used in [15] included the use of an impinge air quality testing apparatus, which is very expensive and requires a huge budget, which makes it heavy and difficult for developing countries governments to fund [10]. The other approach is that samples of fine particulate matter (2.5) is collected at the end of each sampling period, stored in plastic petri dishes, sealed, and transferred to the U.S.A. for analysis. Therefore, in this paper, a fuzzy based prediction model for predicting AQI based on the air pollutant concentrations in Kampala city is designed in order to provide the public with information about the quality of air and also the responsible authorities to take the precautions and decisions from an informed point of view of the levels of air pollutants. Fuzzy logic modeling is known to give accurate results in solving non-linear problems.

Main Air Pollutants
The AQI of Kampala is measured, following the three air pollutant concentrations in Kampala: nitrogen dioxide (NO 2 ), sulphur dioxide (SO 2 ) and particulate matter (PM 2.5 ) based on the 24 h average of hourly readings.

Nitrogen Dioxide
NO 2 is mainly produced in internal combustion engines burning fossil fuels, such as cars, power plants and house heaters. Direct exposure to the skin or eyes can cause irritation to the throat and nose, and it also burns. Long-term exposure to relatively low levels of nitrogen dioxide is believed to cause bronchitis and asthma, especially in children [16]. Kampala has thousands of taxis, basically used for public transport as a major means of transportation around the city.
1.1.2. Sulphur Dioxide SO 2 is emitted through the burning of fossil fuels (for vehicles, heating, and power generation) and processing of ores containing sulphur. Exposure to SO 2 causes irritation of the eyes and lungs, causing coughing and aggravation of chronic bronchitis and asthma. Higher SO 2 levels are correlated with mortality from cardiac diseases [17].
Kampala city has a high number of vehicle growth and most of them are old fleets driven with dirty fuel on a poorly planned public transport system and road network [18].

Particulate Matter (PM)
PM refers to a type of air pollutants which consist of a complex mixture of particles suspended in the air, with various sizes and compositions. They are produced by both natural and anthropogenic activities. The main sources of particulate pollution are industrial activities, power plants, motor vehicles, construction activity, fires and natural windblown dust. The major industries in Kampala include the following: sugar, brewing, tobacco, cotton textiles, cement and steel production [19]. PM mass concentration is typically tracked as both PM 10 , the total mass of PM with a diameter of 10 micrometres or less, and PM 2.5 , the total mass of PM with a diameter of 2.5 micrometers or below (and a subset of PM 10 ). This paper concentrated mainly on PM 2.5 , nitrogen dioxide and sulphur dioxide.

Materials and Methods
In this study, a simulation based on fuzzy logic techniques embedded within the MATLAB version R2017b software simulation environment was applied. In order to simulate the proposed model, the MATLAB Fuzzy Logic toolbox was used. Then, the fuzzy prediction model was modeled and its performance behavior was observed, using a set of three input parameters: indices of PM 2.5 , indices of SO 2 and indices of NO 2 . The estimated Kampala Air Quality Index (KAQI) was considered as the output parameter. In order to process the fuzzy logic model, a rule-based Mamdani's fuzzy inference system was used and later, defuzzification processes followed.

Description of Fuzzy Logic
In traditional logic, the degree of truth can be represented by either the values of 1 (true) or 0 (false), but this has limitations because some elements' membership is unclear, thereby rendering traditional methods incapable of handling complex environmental problems that have some kind of vagueness in them. In a crisp set, an element is either a member of the set or not, but also crisp elements can belong to more than one set, for example, height measurements. Therefore, fuzzy logic comes in to cater to fuzziness in solving real-life problems. In fuzzy logic, the degree of truth ranges between 0 and 1, both inclusive. Fuzzy sets allow elements to be partially in a set.
Fuzzy logic helps to compute linguistic variables, that is, variables whose values are not numbers but words or sentences in natural or artificial languages as proposed by Dr. Loft Zadeh of the University of California in the 1960s [20]. According to Banks [21], fuzzy logic can efficiently handle soft computing complex issues. Its techniques have been widely applied in all aspects of today's society, such as industrial manufacturing, diagnosis, automation control, academic education and forecasting. A linguistic variable is a collection of five things represented as <T(x), U, G, M> where x is a variable name T(x) is a set of terms; U is universe of discourse; G is set of syntax rules; M is a set of semantic rules. Fuzzy logic works well in designing non-linear complex control solutions with multiple parameters because of the following [22]: • Fuzzy logic has the ability to describe systems in terms of a combination of numeric and linguistic means. • Fuzzy logic measures the certainty or uncertainty of the membership of an element of the set. • Fuzzy algorithms are often robust in the sense that they are not very sensitive to changing environments and erroneous or forgotten rules.
In the other words, the fuzzy logic method shows the satisfactory value of air pollutants in a continuous value between 0 and 1. Fuzzy logic uses if-then implication reference rules with suitable linguistic description rules. A fuzzy rule is written as if situation, then conclusion [23]. In this case, the situation is also called rule premise or antecedent. The conclusion part is called consequence or conclusion, that is, IF the "antecedent" is satisfied, THEN the "consequent" is inferred.
Therefore, the designed rules are inferred, according to the fuzzy inference knowledge base to generate a generic, fuzzy based algorithm. Then, the model designed as the output represents the fuzzy function to predict the Air Quality Index for Kampala city.

The Proposed Fuzzy Logic Control Model
The proposed prediction model is based on fuzzy control model reasoning to predict the Kampala Air Quality Index as a percentage of the given air pollutant status. For the simulations, we used MATLAB R2017b, an environment where the fuzzy toolbox logic controller is embedded. The fuzzy control model is designed predict the KAQI based on a set of predefined parameters, which include NO 2 , SO 2 and PM 2.5 .
To build the fuzzy logic system, the principle steps are followed as shown in Figure 1. The design steps included during design are as follows: defining the input variables, fuzzyfication, formulation of fuzzy inference rules, defuzzification and model evalutation.

Defining the Input Variables and Fuzzyfication of the Values
In this paper, the KAQI model is designed on the basis of concentration levels of pollutants based on the Environmental Protection Agency air quality index guidelines.
The higher the AQI value, the poorer the quality of air; the lower the AQI value, the better the quality of air. The three input crisp parameters used to define air quality in this paper include the following: nitrogen dioxide (NO 2 ), sulphur dioxide (SO 2 ) and particulate matter (PM 2.5 ). The output of the system is taken as KAQI. The inputs are taken in the form of linguistic variables as well as the output.

Selection of Membership Functions
A membership function (MF) is a function that specifies the degree to which a given input belongs to a set [25]. The output of a membership function is also known as the degree of the membership function, where its value is always limited to between 0 and 1. Membership functions are used in the fuzzification and defuzzification processes to map the non-fuzzy input values to fuzzy linguistic terms and vice versa.
The Mamdani Fuzzy Logic Toolbox has many inbuilt membership functions. In this paper, the triangular membership function, known as (trimf), is applied in the design of the proposed fuzzy based prediction model. The triangular membership function is computationally efficient and is used to normalize crisp inputs.
The Mamdani Fuzzy Inference System is suitable for designing AQI prediction models, as both the inputs and outputs of the Fuzzy Inference Systems are represented by the values of linguistic variables [26]. In order to transform crisp input values into fuzzy values, the membership function for each input is determined.
The corresponding fuzzy membership values in this paper are defined as follows: The intensity of nitrogen dioxide (∆NO 2 ) = Low, Medium and High The intensity of sulphur dioxide (∆SO 2 ) = Low, Medium and High The intensity of particulate matter 2.5 (∆PM 2.5 ) = Low, Medium and High The KAQI is defined and estimated as the output by the following membership values: good, moderate, sensitive, unhealthy, very unhealthy, hazardous.

Formulation of Fuzzy Rules
In fuzzy logic, rules play an important role. They determine the input and output membership functions that are later used in inference process. They are represented by a generic form of if-then. A fuzzy rule maps a condition described by linguistic variables and fuzzy sets to a desired output.
To design the model, the boundary values of the universal sets for the input and output variables are determined.
The fuzzy sets to be defined in universes for the fuzzification process are identified. As shown in Table 1, the boundary values of universal sets are set. Each variable is represented by three different fuzzy sets, 'Low', 'Medium', and 'High' in these universes [27]. In order to determine the boundary values for 'Low', 'Medium' and 'High' fuzzy sets, the corresponding boundary values of the sets are defined in the form of 'Good', 'Moderate-Sensitive', 'Unhealthy-Very-Unhealthy-Hazardous' and fuzzy set values are defined based on the lower and upper boundary values for the universal sets. In this work, the values are defined following the boundaries indicated by the Environmental Protection Agency (EPA) standards [14].
EPA is an international agency that prescribes standards and guidelines relating to air pollution as elaborated in Figure 2 [28]. Table 2 shows the boundary values of universal sets and fuzzy sets NO 2 , SO 2 , and PM 2.5 input variables. The membership functions for the input variables fuzzy sets are defined basing on the boundary values. Table 2. The boundary values of crisp sets and fuzzy sets for input parameters, domain ranges, universe of discourse membership function. The selected output variable is KAQI and is represented by the six fuzzy sets, 'Good', 'Moderate', 'Sensitive', 'Unhealthy', 'Very Unhealthy' and 'Hazardous'. The boundary values of these output fuzzy sets are determined by considering the value ranges used by the Environmental Protection Agency standards as indicated in Table 3. In Table 4, the relationship between the input variables and output variables is determined by the rule base.

Crisp Input Variables Fuzzy Input Parameters Boundary Values for Universal Sets Universe of Discourse for MFs
The fuzzy associative memory method is used to map input fuzzy values to corresponding output fuzzy sets in order to generate the inference rules. In this, the three variables represented by the three fuzzy sets, a total of twenty-seven rules, are generated in the following rule base: for 3 inputs (M) classified into 3 linguistic variables (N). M to the power N rules (M N = 3 3 ) are generated. Rules are formed based on the highest values of NO 2 , SO 2 and PM 2.5 . To generate the rule base, we take into consideration all the pollutant concentrations, that is, if one of the pollutants is high, then the resultant KAQI will be Unhealthy, as clearly indicated in the rule base.

The Fuzzy Control System Design
As observed in Figure 3, the design of the fuzzy based air quality index prediction model is designed using the Fuzzy Toolbox and the Mamdani FIS, integrated within the MATLAB environment. It illustrates how different air pollutants affect the Air Quality Index.

Designs of the Input/Output Fuzzy Membership Functions
From Figure 3, illustration of the sample input/output designs for the fuzzy inference systems variables and their respective membership functions plots are indicated.
In Figures 4-6 the triangular type membership function designed plots for NO 2 , SO 2 and PM 2.5 are indicated. For instance, Figure 4 illustrates the fuzzy inference system for input variables for NO 2 (Low, Medium, and High) and its corresponding membership function plot. Additionally, the estimated KAQI as an output variable is illustrated in Figure 7.

Evaluation of the Proposed Fuzzy Based KAQI Prediction Model
In this section, the proposed fuzzy inference rules that were used in the output evaluation of the proposed fuzzy based KAQI prediction model are captured as shown in Figure 8. The "AND" connector is applied in designing the inference rules and it is important in determining the minimum concentration levels of the estimated Air Quality Index, given a set of fuzzy input parameters as already discussed.
In the model evaluation, all the fuzzy inference rules are considered to have an equal weighted priority function (W = I). This means that all rules have equal priority during model evaluation and therefore, the order does not matter.

Rule and Surface Viewer
The rule viewer is used in the fuzzy inference diagram. It is used as a diagnostic tool to see, for instance, which rules are active, or how individual membership function influences the results. Figure 9 shows how air pollutants NO 2 , SO 2 and PM 2.5 concentration levels greatly affect the Kampala Air Quality Index in the rule view. For example, the figure demonstrates the resultant Air Quality Index, given the concentration values of the air pollutants NO 2 , SO 2 and PM 2.5 ; if NO 2 = 30 ppb, SO 2 = 50 ppb and PM 2.5 = 20.9 µg/m 3 , then the predicted KAQI = 27.
As indicated in Figure 10, the surface viewer is used to view the dependency of one of the outputs on any one or two of the inputs, that is, it generates and plots an output surface map for the model. In this case, the surface view shows how the output KAQI is dependent on the air pollutant, NO 2 , SO 2 and PM 2.5 , concentrations. A change in any of the air pollutant concentrations affects the KAQI. The plot also shows the relation between the air pollutants and the Air Quality Index and it is seen that when there is an increase in any of the air pollutant input concentrations, there is an increase in the KAQI, irrespective of the other values of the air pollutant values.

Deffuzification to Crisp Sets
Deffuzification is the process of converting a fuzzified output into a single crisp value with respect to the fuzzy set in order to generate a readable output. The defuzzified process in the Fuzzy Inference System controller represents the action to be taken in controlling the process. There are various defuzzification methods, and each outputs a different result. A method is chosen depending on the nature of the problem that is being solved.
Therefore, in this study, the Center Of Gravity (COG) method is used because of it being the most commonly used method among others, such as the mean of maxima method, maximum membership principle method and weighted average method. COG gives a more accurate result compared to the rest of the other methods because it finds the value that corresponds to the center of gravity of the curve obtained [29].
To calculate the output crisp value in this work using the COG, assuming "Z" is "C", then the formula for the expression is formulated as follows, where Z is the final output KAQI: Therefore, the defuzzified value of the output for the input values of SO 2 = 30 ppb NO 2 = 50 ppb, and PM 2.5 = 20.9 µg/m 3 is the predicted KAQI = 27 as shown in Figure 9.

Performance Evaluation of the Designed KAQI Prediction Model
After simulating and modeling the proposed model successfully, an evaluation performance is carried out by a comparative analysis of prediction, using the conventional method and the fuzzy logic based method.
To carry out the comparative analysis, first, the Air Quality Index is calculated based on the linear interpolation formula below.
where, Ip = the index for pollutant p Cp = is the monitored concentration of pollutant p BPHigh = the breakpoint that is greater than or equal to Cp BPLow = the breakpoint that is less than or equal to Cp IHigh = the AQI value corresponding to BPHigh ILow = the AQI value corresponding to BPLow The linear interpolation method is used for determining the short and long term air quality indices, and it is used to estimate and predict unknown values for any geographic point data, such as rainfall, noise pollution and air pollutant concentrations [30].
In this study, air pollutant values for NO 2 , SO 2 and PM 2.5 are picked randomly from the data, which are collected from an open source and later used to test and evaluate the model. For instance, if the value of NO 2 is 290 ppb, SO 2 is 350 ppb and PM 2.5 is 300 µg/m 3 , then, using the linear interpolation method to calculate the KAQI, each pollutant is calculated and the highest individual pollutant index out of them all represents the KAQI as shown in the calculation below. Therefore, KAQI becomes 331, whereas when the same data are fed into the Fuzzy Inference System, the model produces the result of 361. Table 5 shows some of the observations that are made to compare the logic and linear interpolation method performance corresponding to SO 2 , NO 2 and PM 2.5 concentrations. It is observed that fuzzy gives sastifactory results and this makes it a great approach to significantly estimate the KAQI. When a graph is plotted for further comparative analysis between fuzzy logic and linear interpolation approaches to determine the KAQI, a strong correlation is observed between both methods as shown in Figure 11.

Conclusions
In this paper, KAQI prediction model based on the fuzzy logic inference system was designed to predict the air quality for Kampala city, according to the air pollutant data concentrations. It was observed that fuzzy logic algorithms are capable of determining the Air Quality Index and therefore, can be used to predict and estimate the Air Quality Index in real time, based on the given air pollutant concentrations. Hence, this can reduce the effects of air pollution on both humans and the environment.  Acknowledgments: C.K. wishes to thank S.K. and E.M. for the numerous advice and guidance throughout the research process.

Conflicts of Interest:
The authors declare that there is no conflict of interest. The funders had no role in the design of the study; in the writing of the manuscript; or in the decision to publish the results.