Analytical Protocol to Estimate the Relative Importance of Environmental and Anthropogenic Factors in Inﬂuencing Runo ﬀ Quality in the Bumbu Watershed, Papua New Guinea

: The wellbeing, socio-economic viability and the associated health of the inhabitant species of any ecosystem are largely dependent on the quality of its water resources. In this regard, we developed a protocol to measure the potential impact of various environmental and anthropogenic factors on runo ﬀ quality at 22 water sampling sites across the Bumbu Watershed in Papua New Guinea. For this purpose, we utilized Digital Elevation Models and several GIS techniques for delineation of stream drainage patterns, classiﬁcation of the watershed based on Land Use / Land Cover, spatial interpolation of rainfall patterns and computation of the corresponding factor runo ﬀ . Our study concludes that a variety of potential challenges to surface water quality are possible such as natural geologic and geochemical inputs, runo ﬀ accumulation of precipitation and organic matter pollutants. The developed protocol can also accommodate socio-economic factors such as community and household health, sanitation and hygiene practices, pollution and waste disposal. This research lays the foundation for further development of an all-inclusive correlational analysis between the relative importance values of the factors inﬂuencing runo ﬀ and spatially distributed water quality measurements in the Bumbu basin. created by poor water quality by analyzing inﬂuence of di ﬀ erent factors on the Bumbu basin.


Introduction
As a result of rapid urbanization, population expansion and concomitant technological developments, substantial strain is being placed on the water systems around the world, making quality of water a matter of global significance [1]. The waters of an ecosystem have a symbiotic relationship with the health and welfare of the resident communities consuming it [2]. Study of the probable impacts of various sources on water quality has been elegantly summarized by a conceptual framework put forward by Granger et al. [3] that can arguably be described as the process of:
Identifying the means of mobilization of the sources; 3.
Assessing the impact of delivery of the sources to receiving waters.
Previous studies have revealed that immediate landscape features and land use patterns significantly influence water quality [4][5][6][7][8][9][10][11]. Nevertheless, these topographical features are susceptible to alterations and are usually examined under Land Use/Land Cover Change (LULCC) initiatives. LULCC can be defined as a convoluted process of transformation of landscape and its related patterns of utility due to environmental and human-induced interactions [12]. These interactions in turn have a corresponding effect on the physicochemical composition of water. Given these interdependencies, water systems are best identified through river basins and watersheds that comprise a uniquely integrated hydrological network through which precipitation runoff flows into a specific larger body of water such as a river, lake or ocean. The network of streams across a drainage basin is the conduit for all precipitation deposited onto the catchment area in its journey to reach the sea. In passing through the catchment area, precipitation receives the detritus of human existence as well as the accumulation of geologic minerals dissolved and granulated to such a degree as to be mobile in the air and on the earth's surface [13]. These constituents in water are indicative of the impact of various environmental and anthropogenic parameters.
With respect to the aforementioned context, the Morobe Development Foundation (MDF) has undertaken a study to understand the mutual dependency that exists between waters of the Bumbu Watershed and the residents of communities of Lae in Papua New Guinea who rely on these waters for their continued existence and sustainability. Resident communities utilize water of the Bumbu river primarily for drinking and sanitation purposes, but simultaneously, they lack access to proper toilet facilities and a treated water supply. This problem is compounded by untreated sewage and the dumping of pollutants by industries established in the vicinity. As a result, water quality has become a serious concern in the region exposing human health to various risks in the form of skin infections and waterborne diseases [14]. The study seeks to fill a gap to address the risks to human health and security created by poor water quality by analyzing influence of different factors on the Bumbu basin.
The protocol developed in this study is designed to explore the relationship between the spatial distribution of diverse factors that have the potential to influence water quality, and the spatial distribution of measured water quality. The use of spatial analysis and corresponding tools for this purpose incorporates a divergent approach in comparison to methods such as pH Redox Equilibrium modeling (phreeqc), which is designed to explore the mechanistic relationship between water composition and the actual geological and hydrological conditions affecting it. Once an exploratory study using the protocol developed in this work is completed, a more detailed approach in the latter direction can be taken up for the Bumbu Watershed. Consequently, to study this relationship, we utilized available Geographic Information System (GIS) methodologies to develop a protocol which not only measures the influence of anthropogenic and environmental factors on runoff but also has the capability to be extended to socio-economic parameters such as community health, pollution, waste disposal, crime and sanitation. The evolution of Geographic Information System (GIS) has led to the advancement of analytical tools to comprehend the interrelationship existing between land use and the corresponding quality of water, which in turn has led to commendable contributions in watershed management [8][9][10]. With respect to this initial research, we consider roads, streets, rainfall patterns, forested and industrialized landscapes as examples of factors that present a possible correlation with runoff water quality. Data for such features often exist in a variety of formats familiar to spatial analysts-i.e., in vectorized and/or gridded databases depending on the data source (these formats will be explained in more detail in further sections).
In pursuit of the above objective, a review of available literature, databases and analytic techniques has been conducted to gather relevant insights. Guoyu Xu et al. [4] studied the effect of multiple temporal and spatial scales on the quality of water across 32 sampling sites in the Wujiang River Watershed in China. They examined eight variables as possible indicators of quality, utilized Partial Least Scale (PLS) regression and found that quality was influenced by landscape configuration, composition and precipitation. The levels of Dissolved Oxygen (DO) were found to be higher in dry season and higher levels of other contaminants were found during the wet season. Only landscape level metrics in periods of rainfall were found to be related to organic matter. They also concluded that watershed buffer areas involving small patches of cropland with high aggregation of forested areas lead to better quality of water. Likewise, Xiao et al. [6] investigated the relationship between water quality and landscape metrics at multiple spatial scales in different seasons. For this purpose, they took into consideration 34 sampling sites across Huzhou City. They utilized stepwise regression and found that built-up land "has a role in influencing" water quality at a smaller scale, whereas at a local scale, multiple land use categories can be expected to impose an influence. Total Nitrogen (TN) was found to be negatively correlated with the index of build-up land, whereas landscape index of forest was positively correlated with it. Moreover, Putro et al. [7] investigated the impact of land use pattern and climate on the quantity and quality of water across two urbanized catchment areas and two rural catchment areas located in the United Kingdom. Using multivariate regression models, they assessed the influence of rainfall and urbanization on the trends in the DO, runoff and temperature of the water network involved. They found that temperature and dissolved oxygen variation with respect to catchment in urban areas are not driven by climatic variables. The temperature, total runoff and DO displayed an upward trend for urban catchments, but the same was not true for undeveloped catchments. In another study conducted by Lintern et al. [5], the authors studied existing literature to understand how spatial variability of landscape characteristics and interseasonal variation lead to variations in water quality. They analyzed different correlations that exist for different landscape characteristics including land use, geology, topography, hydrology, soil type and climate through a rigorous literature review. For example, their review revealed that rainfall is positively correlated with Electrical Conductivity (EC) for developed landscape factors such as urban areas, and negatively correlated with EC for undeveloped factors such as grasslands. Similarly, topography depicted by slope/elevation for undeveloped landscape was positively related with Total Suspended Solids (TSS), Total Phosphorus (TP) and Total Kjeldahl Nitrogen (TKN), whereas slope/elevation for developed landscape factors was negatively related to the same constituents. The authors stressed the need to consider the relation existing between the numerous catchment characteristics, impact of the spatial setting of different landscape features, interannual and interseasonal variability to comprehend the relationship existing between water quality and landscape features. Their paper revealed the wide range of factors that can influence runoff and highlighted the need to take into account a wide range of environmental data.
In this preliminary study, we elaborate how we measured the relative influence of the different factors indicative of environmental and anthropogenic impact on surface runoff at the respective water sampling sites using GIS tools and techniques. We also discuss the relevance of this protocol to the ability to use the factor runoff importance values and to be able to accommodate other parameters such as water, sanitation and hygiene (WASH) practices and other socio-economic factors for our future water quality studies. With the livelihood, health and welfare of the communities dependent on the waters of the Bumbu Watershed, it has become a necessity to explore the relationship that exists between the anthropogenic and environmental factors influencing runoff, WASH conditions, physiochemical analysis of the waters of the Bumbu and the corresponding water quality. This protocol is the first major step, and a stepping-stone in this direction. In Section 2, we elucidate the methodology and the protocol involved based on the different formats and characteristics of the available factors. In Section 3, we present our results related to the computations of raw runoff and relative impact of the factors. In Section 4, we discuss our findings, and our plans to utilize the protocol and its results in our upcoming water quality studies.

Study Area
Our investigation is centered around the Bumbu basin, located in Morobe Province in Papua New Guinea (PNG). Figure 1 illustrates the study area at different scales. The watershed is bounded on the west by the Markham river basin, and on the east by the Busu river basin. The Bumbu River traverses through Lae, the capital of Morobe Province and the second largest city in PNG. The river originates from the Atzera Range and is relatively narrow as it flows downstream at a medium pace. However, during the extreme rainfalls of the flooding season, the rate of flow is much higher [15], resulting in rapid erosion of the sandy loam that is the main constituent of the Bumbu floodplain [16]. All coordinates in this study are based on the World Geodetic System of 1984 (WGS84) datum.
Hydrology 2020, 7, x FOR PEER REVIEW 4 of 28 on the west by the Markham river basin, and on the east by the Busu river basin. The Bumbu River traverses through Lae, the capital of Morobe Province and the second largest city in PNG. The river originates from the Atzera Range and is relatively narrow as it flows downstream at a medium pace. However, during the extreme rainfalls of the flooding season, the rate of flow is much higher [15], resulting in rapid erosion of the sandy loam that is the main constituent of the Bumbu floodplain [16]. All coordinates in this study are based on the World Geodetic System of 1984 (WGS84) datum. In total, 22 water sampling points spread across the Bumbu river basin were chosen for this research as shown in Table 1. These sites lie at different locations and elevations, with varying levels of vegetation and urbanization. Water samples were collected at these points for further research analysis, and their respective position coordinates were captured with the help of GPS. The position of these sampling sites with respect to the Bumbu river basin are represented in Figure 2. The sampling points on the Bumbu river are divided into three main categories, namely Bumbu main channel, left hand Bumbu stream and right hand Bumbu stream sampling points. The Station IDs of these points belong to UA, UB and UC series, respectively. The captured GPS details can be found in Appendix A, Tables A1-A3.  In total, 22 water sampling points spread across the Bumbu river basin were chosen for this research as shown in Table 1. These sites lie at different locations and elevations, with varying levels of vegetation and urbanization. Water samples were collected at these points for further research analysis, and their respective position coordinates were captured with the help of GPS. The position of these sampling sites with respect to the Bumbu river basin are represented in Figure 2. The sampling points on the Bumbu river are divided into three main categories, namely Bumbu main channel, left hand Bumbu stream and right hand Bumbu stream sampling points. The Station IDs of these points belong to UA, UB and UC series, respectively. The captured GPS details can be found in Appendix A, Tables A1-A3.

Overview of the Protocol
Multiple geographically distributed factors such as rainfall, geophysical and geochemical conditions, human and animal populations, residential and commercial WASH conditions and practices, and general landscape use and conditions exist in the watershed [4][5][6][7][8][9][10][11][17][18][19][20]. These together with roads, habitations and forests potentially impact water quality and represent the social, economic and environmental (SEE) factors present. To analyze and understand the impact of these

Overview of the Protocol
Multiple geographically distributed factors such as rainfall, geophysical and geochemical conditions, human and animal populations, residential and commercial WASH conditions and practices, and general landscape use and conditions exist in the watershed [4][5][6][7][8][9][10][11][17][18][19][20]. These together with roads, habitations and forests potentially impact water quality and represent the social, economic and environmental (SEE) factors present. To analyze and understand the impact of these factors on water quality, it is imperative that these factors are assessed appropriately at water quality (WQ) sampling stations throughout the watershed. In practice, for almost all social, economic and environmental (SEE) variables, direct measurement of their impacts is practically impossible. Consequently, it is essential to incorporate into the analysis a systematic and consistent spatial interpolation protocol for estimating the accumulation of these factors at each Water Quality sampling station. The data for the factors taken into consideration for such studies usually exist in spatial formats such as raster layers and vectors shapefiles. Vector GIS layers can be point-, line-or polygon-based. Vector features are carried within a shapefile using geographic latitude and longitude coordinates to define points, lines and edges of geometric shapes. Individual survey points lend themselves to point characterization, while boundaries lend themselves to line characterization. Buildings and other surface structures lend themselves to polygonal characterization. Rivers and streams can be characterized by lines or polygons, but the relevant hydrologic information can be conveyed by line vectors.
As a result of varying datasets, we need to apply different techniques to extract appropriate runoff information for distinct factors involved based on how each factor can be described by line, point and raster-based GIS layers. Due to lack of WASH-related data at the time of this exploratory study, we limited the application of the protocol to the measurement of the influence of runoff of anthropogenic and environmental factors. The data related to WASH parameters and other SEE variables are usually gathered by community surveys of households, and hence are restrained to estimation at point sources. In Section 2.5, we explain how we can study point sources of information by considering the example of rainfall data in this format and demonstrating how we make use of spatial interpolation techniques. Figure 3 depicts the generalized process of runoff extraction using the different environmental and anthropogenic factors formatted in point-, line-and raster-based GIS layers. The resulting outputs include Flow Runoff, Road Runoff, Dense Forest Runoff, Green Space Runoff, Highly Urban Runoff, Habitation Runoff, Semi-Urban Runoff and Rainfall Runoff. The categorization based on environmental and anthropogenic factors is also shown. The procedures involved in extracting raw runoff are more thoroughly explained in the upcoming Sections 2.3-2.6 and procedures for compiling relative importance for the factors are discussed in Section 3.1.

Line Vector-Based GIS Layers
This section explains the methodology to analyze factors which have line vector characteristics. In this context, we delineated the stream drainage network and extracted the pattern and associated values of road network runoff.

Dataset
The first stage of a drainage-based analysis is to procure a Digital Elevation Model (DEM) of the study area. As currently available stream data for the study area are insufficient to directly yield the required stream drainage information, it was determined that a DEM or Digital Surface Model (DSM) of high resolution would be required to achieve the project objectives. In this initial exploratory work, a 1 arc-second DEM supplied by USGS [21] with 30 m resolution was utilized. Efforts to procure a DEM of higher resolution, though preferred, have proven difficult to obtain. The Shuttle Radar Topography Mission (SRTM) starting in February 2000 generated 1 arc second DEMs with 30 m horizontal resolution ranging from 56 • south latitude to 60 • North Latitude in gridded files encompassing 1 • longitude by 1 • latitude. The Shuttle Radar Topography Mission (SRTM) dataset has been shown to provide more accurate modelling over other datasets [22]. In many cases, a study area will include several of these files. In this study, the Lae region of Morobe Province is covered by two DEM files (S06E146.hgt.zip and S06E147.hgt.zip). After expansion and importation of the available band interleaved (BIL) formatted files into Idrisi formatted raster files (RST) in the GIS, the two rasters were merged into a single DEM. Visual examination of Google aerial photography of the study area around Lae suggested a suitable window of the expansive DEM to create a more manageable raster for further analysis. Hydrology 2020, 7, x FOR PEER REVIEW 7 of 28

Line Vector-Based GIS Layers
This section explains the methodology to analyze factors which have line vector characteristics. In this context, we delineated the stream drainage network and extracted the pattern and associated values of road network runoff.

Dataset
The first stage of a drainage-based analysis is to procure a Digital Elevation Model (DEM) of the study area. As currently available stream data for the study area are insufficient to directly yield the required stream drainage information, it was determined that a DEM or Digital Surface Model (DSM) of high resolution would be required to achieve the project objectives. In this initial exploratory work, a 1 arc-second DEM supplied by USGS [21] with 30 m resolution was utilized. Efforts to procure a DEM of higher resolution, though preferred, have proven difficult to obtain. The Shuttle Radar Topography Mission (SRTM) starting in February 2000 generated 1 arc second DEMs with 30 m horizontal resolution ranging from 56° south latitude to 60° North Latitude in gridded files encompassing 1° longitude by 1° latitude. The Shuttle Radar Topography Mission (SRTM) dataset has been shown to provide more accurate modelling over other datasets [22]. In many cases, a study area will include several of these files. In this study, the Lae region of Morobe Province is covered by

Implementation
Runoff is among the components of the water cycle as surface water flows overland instead of infiltrating into the ground or evaporating. The GIS RUNOFF procedure has the ability to directly measure the amount of catchment of every pixel in a grid scene under certain assumptions of the distribution of rainfall over the study area. As a result, we utilized the "WATERSHED" and "RUNOFF" procedures of the IDRISI analysis package of TerrSet Software2 [23]. A critical element of watershed delineation is the location and vectorization of points at the lowest elevation of the watershed-i.e., at the mouth of the stream that has gathered all the streams of the catchment area into a single channel just before it empties into a larger body of water such as the sea or a more major river. In the case of the Bumbu, this point, also known as the pour point, lies at the entrance of the Bumbu Stream onto the Huon Gulf. The watershed delineation can be sensitive to the seed image provided as the lower extremity of the watershed. The CONTOUR function under "feature extraction" was found to be helpful in locating a proper seed image. Next, the steps involved for a drainage analysis based on the line vector characterization of stream data are summarized as follows: • Obtain the SRTM 1 arc second DEM supplied by USGS with 3 0 m resolution and window the DEM to the study area. Using the DEM and the appropriate WATERSHED function of available  Figure 4a) and convert the raster to a watershed vector polygon (Figure 4b).
into a single channel just before it empties into a larger body of water such as the sea or a more major river. In the case of the Bumbu, this point, also known as the pour point, lies at the entrance of the Bumbu Stream onto the Huon Gulf. The watershed delineation can be sensitive to the seed image provided as the lower extremity of the watershed. The CONTOUR function under "feature extraction" was found to be helpful in locating a proper seed image. Next, the steps involved for a drainage analysis based on the line vector characterization of stream data are summarized as follows: • Obtain the SRTM 1 arc second DEM supplied by USGS with 3 0 m resolution and window the DEM to the study area. Using the DEM and the appropriate WATERSHED function of available GIS software, delineate the [WATERSHED] raster (See Figure 4a) and convert the raster to a watershed vector polygon (Figure 4b). • Using the DEM bounded by the watershed polygon, apply the "RUNOFF" feature of applicable GIS Software, delineate the raster of stream channels in the watershed and reformat the raster • Using the DEM bounded by the watershed polygon, apply the "RUNOFF" feature of applicable GIS Software, delineate the raster of stream channels in the watershed and reformat the raster into stream line vectors with the individual stream catchment area as each vector's feature value. Retain this raster layer and shapefile as a relative measure of the [FLOW RUNOFF] traversing each pixel in the catchment area. A basic assumption of this step is that overall, and on average, precipitation, infiltration and absorption are spatially and temporally uniform across the watershed. This is a first order assumption. Extension of the protocol to incorporate spatially variable precipitation will be considered in Section 2.5 below to address some of the shortcomings in this assumption. • Acquire a shapefile of roads and streets in the project area from Open Street Map [24], convert the road shapefile to a raster format on a blank raster of the same location and dimensions as the DEM. RECLASS all non-zero road pixels as 1 on a 0 background. • Using the elevation [DEM] and the "WATERSHED" function of the GIS software, with the road raster overlain as the precipitation image, again collect runoff of the catchment area. This can be retained as the [ROAD RUNOFF] layer. Again, the assumption is that all roads have an equal pollution potential per pixel. It is important to note that runoff units are all in pixels where each pixel is equivalent to 977.21 m 2 . The assumption is that one unit of precipitation falls on each pixel unit of the watershed, or in the case of categories, on each pixel unit of the category. In the case of differential precipitation, fractional precipitation is assumed to fall on each pixel. The runoff procedure accumulates pixels into the stream network as a proxy for actual precipitation.

Raster Based GIS Layers
Similar techniques can be applied to capture the relative potential of other layers to correlate with water sampling station results. This section elaborates on how we rely on aerial photography and/or satellite imagery to create vegetative density and urbanization layers for similar runoff extraction at sampling stations. High-definition aerial photography and satellite imagery of the study area are available from USGS Earth Explorer [21]. Categorization of landscape elements in imagery can be accomplished using various techniques of cluster and classification analysis. In the aerial photography shown in Figure 5, it can be readily seen that the study area varies from what appears to be a mature virgin forest to a highly industrialized urban environment. A study by Doaemo et al. [25] revealed that Bumbu Watershed has undergone extensive deforestation and an increase in urbanization in the last 33 years . In this instance, we settled on five arbitrary but intuitively selected categories of (1) dense forest, (2) regen (regenerating) forest, (3) green space (4) semi-urban and (5) highly urban environments as relevant to the water quality study. The land-use types are largely self-explanatory with the exception of "green space". This land-use category arose as a result of aerial photo interpretation of the landscape. "Green Space" characterization was designed to differentiate between land primarily characterized by vegetation in various stages of tree growth (mature and regenerating forest) and non-vegetated land (designated urban classes). Close inspection of these vegetated areas in aerial photography revealed extensive garden cultivation of otherwise vacant land. The proximity of these garden plots to highly urban and semi-urban areas suggests these areas are used extensively for food production.
Prototypes for the various groups were envisioned. Sampling of the prototypes was accomplished by identifying points in areas assumed to be prime examples of the proposed classes. We identified thirty sample points per class, and the sample points were saved as a shapefile and then rasterized on a raster of the same dimension and location as DEM. Sample points were expanded to rectangles of 5 by 5 pixels covering approximately 2.7 hectares each and converted back to vector shapefile polygons. The distribution of signature polygons is shown in Figure 5.
These polygons constitute the sampling areas to be superimposed on the satellite imagery for the development of class signature profiles. In this study, signature profiles were developed by sampling the individual color bands of the satellite imagery of Hansen et al. (2013) [26]. The profiles/signatures that were developed were then used to hard classify the entire study area using maximum likelihood estimation (MLE) for final classification of the watershed as shown in Figure 6.
Each land-use class was individually coded as a categorical variable and mapped as a separate layer. As discussed in the example of Section 2.3.2, a Roads and Streets line vector shapefile available from Open Street Map [24] was similarly dummy coded and transformed into a categorical gridded map layer. Subsequently, a separate Population/Habitation layer was extracted from aerial photography by filtering pixels exhibiting high reflectance values >90 for all three RGB bands. The high reflectance was assumed to be the sun's reflection from metal rooftops. This layer was deemed advisable as a secondary measure of human habitation and human activity that might be missed by other urban classifications. Results of the grid transformations and categorizations are shown in Figure 7a,b.
Hydrology 2020, 7, x FOR PEER REVIEW 10 of 28 to rectangles of 5 by 5 pixels covering approximately 2.7 hectares each and converted back to vector shapefile polygons. The distribution of signature polygons is shown in Figure 5. These polygons constitute the sampling areas to be superimposed on the satellite imagery for the development of class signature profiles. In this study, signature profiles were developed by sampling the individual color bands of the satellite imagery of Hansen et al. (2013) [26]. The profiles/signatures that were developed were then used to hard classify the entire study area using maximum likelihood estimation (MLE) for final classification of the watershed as shown in Figure 6.  Each land-use class was individually coded as a categorical variable and mapped as a separate layer. As discussed in the example of Section 2.3.2, a Roads and Streets line vector shapefile available from Open Street Map [24] was similarly dummy coded and transformed into a categorical gridded map layer. Subsequently, a separate Population/Habitation layer was extracted from aerial photography by filtering pixels exhibiting high reflectance values >90 for all three RGB bands. The high reflectance was assumed to be the sun's reflection from metal rooftops. This layer was deemed advisable as a secondary measure of human habitation and human activity that might be missed by

Point Vector Based GIS Layers
A third scenario considers the case where only point estimates of important socio-economic or environmental variables are available. Such is the case for rainfall and other weather-related variables measured at individual sampling stations. Thus, far into the development of a protocol, only spatially and temporally uniform rainfall across the watershed was assumed. In reality, rainfall varies spatially and temporally. Such data require interpolation to landscape coverage for analysis using the methods described in Section 2 above for aerial and satellite imagery. A spatially diverse rainfall pattern is a good example for general application. Unfortunately for this study, only sparse rainfall weather station data are available for the Bumbu Watershed. Given the sparseness of available data, estimates of the spatial pattern of rainfall for the Lae area resulted in a rainfall mapping with substantial uncertainties associated with the estimated spatial pattern. For the purposes of protocol development, these large uncertainties in the estimates of the spatial and temporal distribution of rainfall will be ignored.  After categorization of the watershed, multi-category rasters are converted into individual single category feature rasters coded 1/0. The roads and population/habitation rasters are similarly recoded 1/0. The procedure from this point follows the same procedure described in previous section for roads. We utilized the DEM to accumulate [CLASS# RUNOFF] for each class. Next, by overlaying the [SAMPLING POINT] raster onto each [CLASS# RUNOFF] raster, a class# runoff value is assigned to each sampling point and saved in an attribute values file for later incorporation into correlational analyses along with other sampling station results. It is again useful to convert the rasters to [CLASS# RUNOFF] line vector shapefiles and point vector shapefiles for graphic presentation of results.

Point Vector Based GIS Layers
A third scenario considers the case where only point estimates of important socio-economic or environmental variables are available. Such is the case for rainfall and other weather-related variables measured at individual sampling stations. Thus, far into the development of a protocol, only spatially and temporally uniform rainfall across the watershed was assumed. In reality, rainfall varies spatially and temporally. Such data require interpolation to landscape coverage for analysis using the methods described in Section 2 above for aerial and satellite imagery. A spatially diverse rainfall pattern is a good example for general application. Unfortunately for this study, only sparse rainfall weather station data are available for the Bumbu Watershed. Given the sparseness of available data, estimates of the spatial pattern of rainfall for the Lae area resulted in a rainfall mapping with substantial uncertainties associated with the estimated spatial pattern. For the purposes of protocol Spatial interpolation is a well-researched field of geology where point samples of geologic formations must be used for interpolation and reliable estimation of the amount and value of mineral resources. Methods of spatial interpolation include Triangulated Irregular Network (TIN) and Kriging. The sparseness of our sample points failed to satisfy Kriging requirements for a sufficient number of sample points to estimate spatial autocorrelation. Thus, for the purposes of this study, less demanding TIN methods were employed. The current study confines itself to consideration of the spatial variation of average annual rainfall. At present, current weather station data are too sparse to reliably estimate the spatial variation of rainfall across the Bumbu Watershed. Historically the situation is slightly better. McAlpine et al. (1975) [27] reported results of a 15-year study at 600 weather stations across mainland PNG and the islands. Though the McAlpine data are out of date and climate patterns are changing, the McAlpine data represent the best current available estimate of the pattern of the spatial variation of rainfall of the study area, even if absolute amounts of annual rainfall have changed.
TIN network and TIN surface estimation are a standard feature of GIS packages. Thirteen weather stations in the vicinity of Lae from the McAlpine study as shown in Figure 8a were used for this current study. Using the 13 McAlpine point estimates of average annual rainfall 1957-1972 represented in Figure 8a, TIN and TIN surfaces were compiled of the estimated pattern of spatial variation across the Bumbu Watershed study area as illustrated in Figure 8b. The rainfall surface was generated onto a grid, compatible in location and resolution with the watershed DEM used for previous RUNOFF analyses. It is convenient to convert the raw rainfall into a grid coded 0 to 1 as the fraction of maximum expected annual rainfall across the watershed. The scaled RAINFALL surface and contours are shown in Figure 8b. Other examples of spatially and temporally distributed variables, measured or estimated by point sources, are results of geo-located population surveys of water, sanitation and hygiene (WASH) practices. In upcoming studies, MDF will undertake community surveys for these variables in proximity to the same 22 water sampling points of this study of the Bumbu Watershed in order to study their relation to runoff water quality. the pattern of the spatial variation of rainfall of the study area, even if absolute amounts of annual rainfall have changed.
TIN network and TIN surface estimation are a standard feature of GIS packages. Thirteen weather stations in the vicinity of Lae from the McAlpine study as shown in Figure 8a were used for this current study. Using the 13 McAlpine point estimates of average annual rainfall 1957-1972 represented in Figure 8a, TIN and TIN surfaces were compiled of the estimated pattern of spatial variation across the Bumbu Watershed study area as illustrated in Figure 8b. The rainfall surface was generated onto a grid, compatible in location and resolution with the watershed DEM used for previous RUNOFF analyses. It is convenient to convert the raw rainfall into a grid coded 0 to 1 as the fraction of maximum expected annual rainfall across the watershed. The scaled RAINFALL surface and contours are shown in Figure 8b. Other examples of spatially and temporally distributed variables, measured or estimated by point sources, are results of geo-located population surveys of water, sanitation and hygiene (WASH) practices. In upcoming studies, MDF will undertake community surveys for these variables in proximity to the same 22 water sampling points of this study of the Bumbu Watershed in order to study their relation to runoff water quality.

Observed Limitations and Rectifications
No major impediments to using the protocol appear to exist except in the limitations of data as explained further below. The value of the protocol will emerge with application to correlation analysis of water quality measurements with the derived inputs. At this time, the imprecision of the DEM necessitated the estimation of the locations of some sampling stations on the derived stream lines. In the cases where there was a discrepancy between derived stream lines and actual streams, estimates were made of the location of sampling station points of equivalent hydrologic position. The uncertainties created by this process are unknown at this point. In Figure 9, below, the differences in positions of the actual 22 sampling sites versus their estimated "equivalent" hydrologic positions on the DEM are shown. The sites numbered 3, 12, 13, 14 and 18 required the greatest adjustment and are highlighted below.
lines. In the cases where there was a discrepancy between derived stream lines and actual streams, estimates were made of the location of sampling station points of equivalent hydrologic position. The uncertainties created by this process are unknown at this point. In Figure 9, below, the differences in positions of the actual 22 sampling sites versus their estimated "equivalent" hydrologic positions on the DEM are shown. The sites numbered 3, 12, 13, 14 and 18 required the greatest adjustment and are highlighted below. In practice, it was found that high resolution aerial photography down to 1 meter pixel resolution as portrayed on the USGS Earth Explorer [21] was superior to satellite imagery for identifying appropriate locations for signature polygons. These images are not always available for download but can nevertheless be used for informal geo-location. Landsat imagery proved superior for signature definition and more exact geo-location. There was a tendency for satellite imagery to incorrectly identify streambeds and water bodies as "highly urban" and "semi urban". Attempts to create a sixth signature and category for water were unsuccessful, but masking out of the stream layer obtained by DEM analysis partially compensated for this shortcoming. The results presented are based on the use of the 5 land-use categories for categorization of 4-band satellite imagery as compiled by Hansen et al. (2013) [25]. Refinement for any application can explore what is most appropriate in that specific study scene. For the purposes of this protocol development, no "ground truthing" other than by aerial photography verification of the categorizations was performed.

Relative Importance of Factors
The concluding step in creating layers for incorporation into correlational analysis is the compilation of "Relative Importance Value" of the RUNOFF layers. The negative (and/or positive) influences of the runoff of SEE factors are diluted by the amount of water flowing through the same sample site. As mentioned before, runoff units are all in pixels where each pixel is equivalent to 977.21 m 2 . The assumption is that one unit of precipitation falls on each pixel unit of the watershed, or in the case of categories, on each pixel unit of the category. After compilation of runoff, factor importance values are found by dividing the factor runoff by the rainfall runoff and multiplying by 100. The formal equation for this calculation can be given by: where    In practice, it is more practical to extract RUNOFF values from all layers in one step and to perform the "importance value" calculation in a spreadsheet. As mentioned earlier, a [SAMPLING SITE] layer was created by projecting the sampling site locations onto a raster with the same dimensions and location as the SRTM 1 arc second DEM supplied by USGS [21].  In practice, it is more practical to extract RUNOFF values from all layers in one step and to perform the "importance value" calculation in a spreadsheet. As mentioned earlier, a [SAMPLING SITE] layer was created by projecting the sampling site locations onto a raster with the same dimensions and location as the SRTM 1 arc second DEM supplied by USGS [21].     Table 2 represents the raw runoff at the 22 water sampling sites across the Bumbu Watershed. Herein, it becomes necessary that the relative impact of each factor is understood in the context of surface water's travel history and the varying level of influence each factor can have on runoff as it moves downstream. In most cases, it is not feasible to directly measure the exact quantity and potential impact of the anthropogenic and environmental inputs at these sites where water quality is measured. It is important to note that the potential impacts of the different factors are present in the form of several physical and chemical constituents carried by the Bumbu river. As precipitation moves across the landscape and downstream to the sea, it accumulates and transports the potential impacts of these factors in the runoff surface water and in the percolating ground water. This potential impact of various factors is diluted by the amount of water involved in their transport. The best that we can do is to infer their relative presence and concentration at the 22 spatially distributed water quality sampling stations and attempt to assess the correlation of their relative presence with accepted measures of water quality. In this regard, we utilized normalized rainfall runoff values to calculate the impact of these different factors on surface water which are enumerated in Table 3   , UC5 and UC6 lie in the urban region of the city of Lae in the downstream area of the Bumbu Watershed and have high measurements for highly urban and semi-urban runoff values. The Bumbu river at these sites receives effluents from different industries located nearby, human detritus from residential areas, pollutants, waste discharge and other forms of organic matter (Figure 2) [14]. Some other physicochemical characteristics of water which can be inspected for correlation with importance values at these points for UA, UB and UC series include Total Suspended Solids (TSS), coliforms, alkalinity, pH and temperature. Thus, it can be seen that the Bumbu river experiences a wide range of influences from different landscape classes which can possibly influence its water quality as its surface water moves from the upstream regions to the downstream regions of the watershed. To accurately examine the relationship of the relative importance values with physicochemical characteristics, other social and economic parameters, a thorough correlation analysis is required, the prospects of which are further elaborated in the discussion section.

Discussion
The protocol developed in this study follows and expands upon the conceptual framework of Granger et al. [3] in identifying potential sources of impact on water quality, their mobilization by rainfall and their delivery across the landscape by a stream network. The methodologies presented in this study facilitate assessment of the relative importance/concentrations of different environmental and anthropogenic factors, enabling investigation of their potential impact on runoff water quality. Initially, the factors which we considered for this purpose include Road/Streets, Dense Forests, Regenerating Forests, Green Space, Highly Urban, Semi-Urban and Habitation/Population. This was followed by the application of different techniques and tools for each category characterized by line, point and raster-based GIS layers. The raw runoff values for the various factors were computed by utilizing DEM, spatial interpolation, classification methods and GIS software functionalities such as WATERSHED and RUNOFF features. The raw runoff values for each of the factors at the 22 water sampling sites are listed in Table 2. As stated in the previous section, we made use of normalized rainfall runoff values to determine the potential impact of the various factors. Figures 11 and 12 represent the spatial distribution of the relative importance values of the factors involved. Table 3 indicates the corresponding importance values at the 22 sampling sites. For instance, considering Sampling Site number 22, one can observe that the runoff importance is negligible from several factors indicative of varying vegetation density and is significantly greater from highly urban and semi-urban areas. This would imply a higher impact of urban areas on surface runoff and would indicate a greater concentration/impact of constituents contributed by urban factors. Similar observations can be made at other water sampling sites. One can observe that the runoff at each of the 22 sites involves considerable levels of varying influences from a variety of factors. This indicates the range of potential challenges possible to runoff quality in form of organic matter, coliforms, geochemical and natural ingredients to a name a few. However, to explore the exact nature of relationship existing between these factors and the waters of the watershed, a detailed correlation analysis is needed, which we discuss in more detail below.
In our upcoming studies, we plan to study how the runoff importance values for different factors are associated with the physical and chemical composition of the Bumbu waters by performing a detailed and thorough water quality study. This would include collection of water samples at the 22 sites followed by physicochemical analysis. Furthermore, a factor analysis/principal component analysis (FA/PCA) can also be performed as performed in earlier studies [28][29][30][31]. This would help us in understanding several relationships with respect to the waters of the Bumbu Watershed. For instance, whether Total Suspended Solids (TSS) and turbidity are correlated with surface runoff due to rainfall events, and to what extent and degree. Another example could be measuring the variation of DO, Electrical Conductivity (EC) and Thermotolerant Coliforms (TC), which are related to organic matter pollution. Some other examples would include evaluating how pH, alkalinity, temperature and metallic ions vary with different levels of vegetation and urbanization in the vicinity of the sampling points.
Measuring the water quality index at the water sampling sites using available standards and guidelines such as those drafted by the Canadian Council of Ministers of the Environment (CCME) will also be a necessary step to comprehend how quality of water is related to the runoff importance values computed in this study [32]. We also plan to undertake a community-wide household survey to gather relevant WASH-related data such as sources of drinking water, toilet facilities, water storage and waste disposal methods used in proximity to our 22 water sampling sites. Additionally, we intend to gather community data based on parameters such as health, crime and pollution. Health-related parameters may include variables such as presence of stomach ailment, skin infections, HIV/AIDS and respiratory illness. Although crime and HIV may seem to be far-reaching candidates for correlation with water quality, the study's fundamental objective is to provide a method to explore if such non-intuitive correlations exist, and to provide a reasonable protocol for exploration, analysis and resolution of the complex questions they raise. As mentioned earlier, these types of data are constrained to be point sources of information and can be utilized easily by our protocol as implemented in the case of rainfall. All these inputs together with the anthropogenic and environmental factors form an array of socio-economic environmental (SEE) inputs. Finally, we intend to perform a correlation analysis to determine how the runoff importance values calculated from this study vary with accepted measures of water quality, e.g., the Canadian Ministers of the Environment Water Quality Index, different SEE factors as well as physicochemical parameters of water. Several multivariate analytical techniques such as Pearson and Spearman correlation, numerous regressions models and other statistical tools are available and have been used previously in this regard [4][5][6][7][8][9][28][29][30][31].
Our study is bound by data limitations such as the utilization of data collected by McAlpine et al. (1975) [27]. Although climate patterns are changing, McAlpine data represent the best current available estimate of rainfall patterns across Morobe Province to reliably perform spatial interpolation for the Bumbu basin. Another limitation is imposed by the utilization of a 30 m DEM despite our efforts to obtain a DEM of higher resolution for the watershed. Due to imprecision of the DEM, there is discrepancy between derived stream lines and observed streams, a shortcoming that we attempted to rectify as explained in Section 2.6. Despite the above limitations which we tried to overcome, the uniqueness of our approach lies in the fact that the protocol we developed not only takes into account various environmental and anthropogenic factors but also has the potential to accommodate the various socio-economic factors such as community and household health, crime and waste disposal.

Conclusions
Unrestricted urban expansion, burgeoning population and industrialization among various other anthropogenic factors have put significant stress on water resources all over the world [1]. In this context, the Morobe Development Foundation (MDF), a not-for-profit community-based organization located in Lae, Morobe Province, Papua New Guinea has undertaken research to understand the symbiotic relationship that exists between water systems of the Bumbu basin and the health and welfare of the resident communities. This study seeks to facilitate a capacity to link the quality of surface waters to multiple social, economic and environmental factors differentially and geographically distributed by tracing the travel history of surface waters through varying landscapes. These include urban, semi-urban, dense forests, green space and regenerating forests. For this purpose, we developed an analytical protocol to determine the potential impact the above-mentioned factors can have on surface water quality. This is essential for our future work, and a positive contribution to the study of water quality in general. It is worthwhile to mention that the protocol can be applied to many factors in addition to those mentioned in this research, based on its underlying point, line or raster-based characteristics. Although previous studies [4][5][6][7][8][9][10][11]18,20] consider a variety of factors, the uniqueness of the protocol is in its examination of the relationship between quality of water, physicochemical characteristics, WASH practices [17,19] and socio-economic parameters. The protocol will help us to understand the relationship that exists between the above parameters and the inhabitants of the Bumbu Watershed, a study that is the first of its kind for the region. Consequently, this will provide relevant insights to aid our upcoming projects which are aimed at addressing probable risks to human health created by poor water quality in the Bumbu basin [14].
We utilized a diversity of spatial analytical tools and techniques to responsibly analyze an array of environmental and anthropogenic data inputs possessing the potential to impact water quality. The study confirms the value of the conceptual framework put forward by Granger et al. [3] to identify potential sources of impact, their mobilization and their delivery to water systems. For our future studies, the practical value of the protocol will be tested in its ability to (a) interpret, interpolate and estimate appropriate values of diverse inputs of other factors associated with water quality as measured at WQI sampling stations; (b) provide the tools to estimate the spatial distribution of variables reported in household surveys; and (c) appropriately estimate the importance values of those survey parameters at the WQI sample sites. The protocol and its procedures described in this paper were generalized enough to be applicable in other geographical settings to determine the influence of runoff of a variety of factors, and to calculate the relative importance and correlation of these factors with other SEE parameters. Through this study, we intend to bring the protocol and its applications into the limelight, gather international attention of researchers and volunteers and simultaneously garner local support for our future work.