A Complete Environmental Intelligence System for LiDAR-Based Vegetation Management in Power-Line Corridors

: This paper presents the ﬁrst complete approach to achieving environmental intelligence support in the management of vegetation within electrical power transmission corridors. Contrary to the related studies that focused on the mapping of power lines, together with encroaching vegetation risk assessment, we realised predictive analytics with vegetation growth simulation. This was achieved by following the JDL/DFIG data fusion model for complementary feature extraction from Light Detection and Ranging (LiDAR) derived data products and auxiliary thematic maps that feed an ensemble regression model. The results indicate that improved vegetation growth prediction accuracy is obtained by segmenting training samples according to their contextual simi-larities that relate to their ecological niches. Furthermore, efﬁcient situation assessment was then performed using a rasterised parametrically deﬁned funnel-shaped volumetric ﬁlter. In this way, RMSE ≈ 1 m was measured when considering tree growth simulation, while a 0.37 m error was estimated in encroaching vegetation detection, demonstrating signiﬁcant improvements over the ﬁeld observations.


Introduction
As electrification is becoming a pillar of social [1,2], economic [3] and environmental sustainability [2,4,5], power transmission lines are under increasing burden. While their performance monitoring and management have long been addressed through the concepts of so-called Smart Grids with the Internet of Things [6], benefits of environmental intelligence are still to be explored when addressing their physical safety [7]. As 30% of power outages are reportedly caused by weather conditions [8], 90% of which are attributed to tree-related incidents [9], vegetation management in power line corridors maintains a major challenge [7,8]. While it has already been shown that 6% improvement in reliability and 9% reduction in total costs is possible by only optimising tree trimming tasks [10], utilities still spend millions of dollars on vegetation management every year, making it one of the costliest activities in distribution asset management [7]. In addition, new challenges are now emerging related to mitigation of long-term negative impacts on biodiversity and ecosystems' sustainability [11,12]. Accordingly, digitalization in vegetation management has been explored increasingly in the last decade, within which the use of Light Detection and Ranging (LiDAR) has gained considerable attention [13].
Early researches into the subject were directed towards the extraction [14][15][16][17][18] and 3D reconstruction [19][20][21][22][23] of power lines from LiDAR data. While, traditionally, these methods achieve mapping of pylons, followed by recognition of wires, more recent approaches focus on improving their performances [24] and extraction of more detailed information, such as, for example, reconstruction of bundle conductors [25]. With reported accuracies of above 90%, contemporary methods can provide adequate support in power-line mapping tasks. However, assessment of power-line corridor clearance has, until recently, been a less frequently addressed research topic [13]. Despite the early recognised potentials [26], an efficient automatic LiDAR-based detection of clearance hazards (such as tree encroachment) has been reported only recently, with clearance measurement accuracy at the decimetre level [27], while over 95% accuracy of power-line and vegetation recognition for hazard detection was reported recently in [28]. Still, as argued in [13], fusion of multiple data sources can provide further benefits by reducing the monitoring costs, as well as improving temporal resolution (e.g., by fusion of aerial images [29]). In addition, data fusion can enhance monitoring with prediction capacities, as explored very recently with statistical predictions of tree-related power outages based on historical and weather data [30].
While Predictive analytics is, thus, emerging as a new trend in effective power-line corridor management, improved vegetation growth simulations are needed, that are tuned to the exact ecological niches under inspection. Although many studies were conducted on the possible use of LiDAR data in vegetation [31,32] and forest management [33][34][35], their primary focus remained in mapping the state of vegetation rather than using it for automated regression of vegetation growth. A complete data fusion stack has not yet been introduced, despite several studies which have shown the possible use of LiDAR for this purpose [36][37][38].
In this paper, we propose a new approach for achieving Environmental Intelligence in vegetation management using structured data fusion of LiDAR-generated data products with complementary thematic maps and administrative data sources, i.e., development of a digital twin [39]. Accordingly, following are the key scientific contributions of the paper: • A complete LiDAR data processing pipeline for fusion of the derived data products (like digital terrain models, canopy height models and 3D data about power lines), with cadastral data and other important thematic maps for vegetation management, such as, for example, distribution of tree spices and soil pH maps, • An efficient approach for encroaching vegetation detection that enables accurate assessment of corridor clearance and provides future threat assessment, and • A new data segmentation approach for learning vegetation growth simulation, with weak predictors tuned to specific ecological niches.
The rest of the paper is organised as follows: A new methodology for vegetation management is proposed in Section 2. Its results are presented in Section 3, while Section 4 concludes the paper.

Study Area and Data Preparation
In order to account for various testing conditions, an 18 km long corridor of the Slovenian national power transmission grid was selected. The corridor extends from the city of Nova Gorica to the town of Avče, and, thus, spans from the Sub-Mediterranean to the Alpine climate, and, accordingly, contains diverse forest stands with highly versatile terrain configurations. The terrain is also characterised by different soil qualities and soil pH levels, as well as sunlight conditions. In total, the power line corridor contains 104 power cables with a total span of approximately 168 km.
For the purposes of this study, two LiDAR data acquisitions were conducted in the years 2014 and 2018. As shown in Figure 1, the whole study area was divided into 1 × 1 km tiles, covering the total area of 24 km 2 , while the protected area of the power-line corridor covered 2.72 km 2 . In addition to LiDAR data, the auxiliary data sources used in this study are reported in Table 1.  In this section, a complete data processing framework is presented for fusion of LiDAR derived data (e.g., digital terrain and canopy height models), with auxiliary data sources (e.g., cadastre data and forest species maps). Here, we followed the JDL/DFIG (Joint Directors of Laboratories/Data Fusion Group) model, that is considered to be a de facto standard reference model for assessing features from heterogeneous data sources and streams. As shown in Figure 2, it prescribes feature modelling over the following levels [40,41]: Level 0-Source preprocessing, where key LiDAR data products were generated, and their spatio-temporal data alignment with auxiliary thematic maps was achieved, • Level 1-Object assessment dealt with the definition of individual trees, their features, as well as the features of power-lines; • Level 2-Situation assessment provided encroaching vegetation detection and risk assessment features; • Level 3-Threat assessment integrated tree-growth predictions for the assessment of risk prognosis features; • Level 4-Process refinement dealt with the management of other levels, recorded performance of the system, provided adaptive data acquisition and made decisions on how to improve the system efficiency; • Level 5-User refinement dealt with knowledge management and visual analytics to support decision-making; while • Level 6-Asset management, in our case, provided task scheduling by also considering available resources, legal constraints, and other operational factors.
As follows from the above, level 4 addressed overall system optimisations, while levels 5 and 6 were knowledge and vegetation management levels. Accordingly, we address in the continuation of this section levels 0 to 3, that provided environmental intelligence in support of these tasks.

Level 0-Data Preprocessing
With the objective to provide an accurate description of the current status of the vegetation in power-line corridors, and, thus, support higher analytics levels, spatiotemporal alignment of input datasets was achieved first. For this purpose, digital terrain model (DTM) and canopy height models (CHM) were generated from LiDAR data, using ground-and vegetation-points' classifications, as proposed in [42,43]. Note, however, that visual inspection and user refinements (Level 5 of data fusion) were necessary here, in order to correct manually the inevitable inaccuracies introduced by the automatic LiDAR data classification algorithms. Sampling of low ground-points and high vegetation-points into 1 × 1 km tiles was then performed with 0.5 m resolution, while data cleaning with interpolation of missing data and correction of tree heights was done according to [44]. The obtained DTM was then subtracted from the digital surface model, as obtained from sampled vegetation points, in order to define the CHM. Nevertheless, due to traditionally infrequent LiDAR data acquisitions, temporal alignment of the DTM and CHM was achieved by considering additionally: • Forest management activities conducted after LiDAR data were recorded; and • Vegetation growth up to the current date.
While the history of management activities was maintained with a log of the completed work orders and associated vector layers describing the region, date, and type of cleaning tasks (in accordance with risk assessments, as described by Level 2 of data fusion), tree growth predictions (as described by Level 3 of data fusion) were used to estimate vegetation development. Temporal alignment was, thus, achieved iteratively (with 1 month temporal resolution), where, in each iteration, the CHM and DTM were corrected in accordance with the power-line corridor management tasks from the previous month, followed by tree growth simulation, in order to approximate the current status of the vegetation. Finally, auxiliary raster data sources were resampled, according to [45,46], in order to achieve their alignment at 0.5 m resolution.

Level 1-Object Assessment
In order to achieve data fusion at an object level, spatial definition of individual trees was achieved first using single tree-crown delineation on the temporally-aligned CHM, as proposed in [47]. Object assessment was then resolved by intersecting the resulting vector layer with auxiliary thematic maps, by which extraction of essential individual-tree parameters was brought about, as described in Table 2.  [42] to the area of the tree crown model In addition to the definition of the individual trees, detailed 3D geometry of the power transmission line was extracted from LiDAR data with 1.5 m resolution, as proposed by [24], resulting in a vector layer containing a little over a million 3D points. Finally, in order to provide a simplified assessment of their possible sagging, segments were attributed with their distances from the transmission towers, as well as their voltage levels.

Level 2-Situation Assessment
Following from the above, encroaching vegetation detection was performed using a 3D filter, defined by swept volume parametrisation [48]. Funnel-shaped volume generator was used for this purpose, with parametrised width, height, and side-angle in accordance with the power-line attributes. While exact definitions of these may differ in accordance with the legislation, terrain configurations and other type-specifics of power-transmission lines, the following values were used in our case (see Figure 3): • The width of the filter was defined in accordance with the legislation, where 15 m was used for 110 kV transmission lines, while 40 m was used for higher voltage 210 kV and 400 kV power lines; • The height of the filter was defined in accordance with the 3D shape of the lowest power-transmission line, ensuring at least 5 m clearance beneath it; • The angle of the filter was fixed at 45 • in order to prevent the risk of possible damage cased by falling high trees.
Accordingly, the volumetric filter definition was achieved by sweeping a generator along the power-transmission axes and storing it as a raster layer with 1 m resolution, in order to ensure its spatial alignment with the CHM. Thus, encroaching vegetation detection, together with the actual risk assessment, was achieved straightforwardly by pixel comparison. Finally, vector layers were generated using isoline rendering [49], as shown in Figure 4. As, generally speaking, the growth of vegetation is observed at a much higher rate than structural changes of the power transmission lines, most of the computationally expensive tasks of the proposed approach were conducted during the preprocessing step. This concerns volumetric filter definition and its rasterisation, while only its pixelcomparison with CHM is actually required during the processing. This proved to be useful, in particular when considering the predictions of encroaching vegetation on the simulated CHM and, thus, improving system performances during the threat assessment under different scenarios.

Level 3-Threat Assessment
A new context-based ensemble regression for tree growth prediction is presented in this section that proved capable of dealing with various growing conditions and, together with the encroaching vegetation detection, allowed for holistic threat assessment. The rationale behind the approach was that trees growing within similar ecological niches behave similarly, and, thus, segmenting the learning data accordingly was expected to result in an improved prediction accuracy. Furthermore, such an approach allowed for using fuzzy classification of individual trees' species (as provided traditionally by treespecies distribution maps), while also accounting for anisotropic tree-crown development (e.g., on forest-edges). This was achieved by introducing contextual features, related to ecological niches, at the level of each individual CHM pixel and learned regression model, respectively, (see Figure 5). Here, we follow the principle that individual tree parameters, as described in Table 2, defined the context of each individual pixel, while its relation to the surroundings were used as regression features. Respectively, given a set of context vectors C = { − → c (p i )}, associated with learning pixels p i = (x i , y i ), the context of a testing pixel p t was defined by a subset C k [p t ] ⊆ C of k context vectors that are closest to − → c [p t ] according to some distance measurement. While an arbitrary mapping function d : → R could be used for this purpose, contextual features were, in our case, of significantly different types and scales, and, thus, the L1−norm was applied on ranked differences in feature-values rather than applying it on the feature-values themselves. Let a mapping function rank f : N define a standard competition ranking of the difference between − → c (p i ) and − → c (p j ) according to the feature f , the used distance function d was defined formally as: (1) , while the regular difference between categorical (e.g., pH and soil quality levels), as well as numerical features (e.g., tree heights and air temperature), was used for their ranking. On the contrary, ranking of vector-type features was achieved using angular distance. Finally, regression model R = {R p t } was defined by a set of weak regression models R p t , each associated with a testing pixel p t . Two types of explanatory variables were used for this purpose, namely, a pixel and tree heights that related to the estimated increase in CHM due to the growth of the tree itself, and the heights of the neighbouring pixels that account for possible overgrowing of its surroundings. By defining the neighbourhood of p i ∈ C k (p t ) using Cartesian product W S = {x i − S, . . . , x i , . . . , x i + S} × {y i − S, . . . , y i , . . . , y i + S}, where S ≥ 0 specifies its size, a feature vector − → v (p i ) is given formally by an ordered set of CHM values as: where W S \ W S−1 refers to a set difference between two neighbourhoods W S and W S−1 , as shown in Figure 6. Note, however, that by selecting the maximal CHM value within a given W S \ W S−1 , orientation independent definition of weak regression models was achieved, while the terrain slope orientation and corresponding tree heights, together with fuzzy tree species classification and other contextual features, was already addressed during the learning data segmentation. In a sense, the developed prediction model can, thus, be considered as a KNN regression. Similarly to the well-known KNN classification [50], this is an efficient lazy learning algorithm that, rather than providing a generalised model, uses all the training data for predicting the outcome of the target variable of testing samples. However, contrary to the traditional approach, the KNN search was achieved only on contextual features. Thus, although the implementation followed the optimisations proposed in [51], an arbitrary regression model could be applied for the actual predictions. As confirmed by the results (see Section 3), straightforward linear regression, applied on the K = 100 contextually most similar pixels, turned out to be the most efficient in our case.

Results
The proposed environmental intelligence system for vegetation management in powerline corridors (i.e., a digital twin) was implemented following high-performance monolithic architecture in the C++ programming language and deployed on a regular workstation specified by AMD ® Ryzen™ Threadripper™ 1920X Processor with 12 cores, 3.50 GHz frequency, 24 logical processors, L1 = 1.1 MB, L2 = 6.0 MB, and L3 = 32.0 MB of cache, and 64 GB of main memory. System validation was carried out on a test dataset P, containing |P| ≈ 100 million pixels (as described in Section 2.1) from the following perspectives: • Vegetation growth simulation accuracy was evaluated first by pixel-comparison between the predicted CHM and actual CHM using the root-mean-square error (RMSE) metric, defined as: where CHM [p i ] was estimated by learning a weak regression model on K = 100 contextually closest pixels P 100 p i ⊂ (P \ {p i }) to a pixel p i amongst all the pixels from the set P \ {p i } ; • Encroaching vegetation detection validation was then achieved by comparing the areas of detected risks with the history of the performed power-line corridor cleaning tasks; and • System performances' assessment was finally carried out by estimating a per-pixel simulation and encroaching vegetation detection times.
A detailed report of the obtained results is given in the continuation of this section.

Vegetation Growth Simulation Assessment
The validation of the vegetation growth simulation was carried out by comparing the accuracies and execution times achieved using three traditional regression approaches, namely, linear regression, KNN regression and artificial neural networks, with and without using learning data segmentation based on contextual features.
As follows from Table 3, notably higher execution times were measured when applying the proposed contextual segmentation of learning data, while this resulted in a decrease of RMSE of all tested regression models, with an average improvement of over 10%. Figure 7 provides further details about the error distribution in comparison to the distribution of the measured vegetation growths. In all the cases, contextual segmentation managed to reduce error variance, as well as its range. However, as the majority of measured errors were within the [−1, 1] range, while significantly larger RMSE was measured, the presence of outliers was apparent. As linear regression with contextual segmentation of learning samples turned out to be the most accurate, showing little to no over-fitting, spatial distribution of errors obtained in this way is discussed further.  Figure 7. Distribution of measured errors achieved by the tested regression models with (C) and without (NC) using contextual segmentation of the learning data in comparison to measured tree growth.
A comparison, shown in Figure 8, shows that the simulated and ground-truth CHMs matched to a large extent (i.e., the yellow and orange colours in Figure 8d). However, a notable pattern of high error values is apparent, in particular, when considering forest edges. In cases of north and west edges, the proposed method overestimated the tree growth, while underestimations are more noticeable on the south and east edges. As similar, although less obvious, patterns can be noticed when considering the contours of dominant trees, larger errors were attributed to the predicted tree-crown expansion rather than to the predictions in vegetation growth.

Accuracy of Encroaching Vegetation Detection
The accuracy of the encroaching vegetation detection was validated by comparing the detected risks with the field observations carried out by the asset management at the Slovenian national electricity transmission company (Eles d.o.o.). During the 2014-2017 period, 10 corridor clearances were carried out, covering a total area of approximately 1.9 km 2 , with the largest covering the area of 7341 m 2 and the smallest related to an individual tree with the area of 22 m 2 . Within these regions, the proposed method identified approximately 132 areas of encroaching vegetation with a total area of 0.5 km 2 , with the area of individual regions ranging from 1415 m 2 to 0.25 m 2 (i.e., an individual pixel). As follows from the example shown in Figure 9, the reason for this lay in the fact that corridor clearances were carried out over the entire inner area of the power line corridor, not selectively on the detected encroaching vegetation. On the contrary, as indicated by the gray areas in Figure 10, the proposed method identified a number of threats to the safety of the power transmission lines outside of the clearance areas. In total, 449 such regions were detected, ranging in area from 228 m 2 to 0.25 m 2 , with a total area of 1.8 km 2 . Among these, 396 were smaller than 10 m 2 and can, thus, be attributed to the individual branches or their clusters rather than the actual trees. Providing these do not pose significant threats to the safety of the power transmission line, they can be thresholded straightforwardly during the post-processing. On the other hand, a large majority, namely 34 of the remaining 53 over-detected regions larger than 10 m 2 , were detected on the forest edges, with individual branches posing a significant threat to the safety of the power transmission line, while the tree-tops themselves had not been detected as threatening. Their over-detection may, therefore, in a significant part, be attributed to the threat-assessment on the field, which is generally prone to errors. This was also confirmed by 16 over-detected trees behind the forest edge, which were, in the most part, not visible from the centre of the power line corridor. Finally, the remaining three over-detections were related to the misclassified LiDAR points. Due to the reported uncertainty in the results, clearance measurement accuracy was additionally assessed on five individual trees, as described by [27]. Here, the average absolute accuracy equal to 19 cm was measured, with maximal error equal to 37 cm. While this is notably lower than reported in [27], where unmanned aerial vehicle LiDAR data acquisition was used, the LiDAR data were, in our case, recorded at a higher altitude (from a helicopter) and were, thus, of lower density.

System Performances
The system performance of encroaching vegetation risk assessment was conducted on a total of 168.5 km (as reported in Section 2.1) of power cables, modelled at 1.5 m resolution with exactly 1,000,325 points. As follows from Section 2.2.3, the proposed approach consisted of a preprocessing step that included volumetric filter definition and its rasterisation, and the actual runtime processing with definition of situation assessment vector layer using ISO-lines. Accordingly, the experiments were conducted by running the whole encroaching vegetation risk assessment procedure 50 times, and measuring the system processing times during these steps.
Note that the reported time complexity of the proposed stages in Table 4 was derived by dividing the test power line into randomly defined segments and estimating the execution time per number of pixels used for encroaching vegetation detection. This procedure was repeated 50-times. Together with vegetation growth simulations (see the execution times in Table 3) the whole process of threat assessment on 24 km 2 area was realised in ≈10 min. with 0.5 m resolution.

Discussion
As confirmed by the results, the proposed approach brings about an efficient environmental intelligence for improved vegetation management in power line corridors. By vegetation growth prediction and situation assessment, it enables predictive analytics to be achieved over the structured data fusion LiDAR derived data products with auxiliary thematic maps that followed the JDL/DFIG data fusion model. In this context: • Spatio-temporal data alignment was achieved by data sub-sampling to a common resolution, while composing the current state CHM by adjusting it according to past clearance task and predicted vegetation growth from the time the LiDAR data were recorded. As previous studies have focused exclusively on mapping the state of power line corridors, the proposed approach offers improved monitoring capacities that prolong the relevance of the acquired data. • Situation assessment based on parametric definition of a funnel-shaped volumetric filter can be achieved in preprocessing, which allows for fast encroaching vegetation detection. While the results achieved on higher high-altitude airborne LiDAR, showed slightly lower, yet comparable, accuracy to the related study performed on UAV acquired data, significant improvements in comparison to the field-based encroaching vegetation detection have been demonstrated. • Threat assessment, enabled by vegetation growth prediction that utilises contextual segmentation of learning data for tuning weak regression models to specific ecological niches. While this improved prediction accuracy, the proposed approach provides the first attempt towards establishing a digital twin of the power line corridor ecosystem.
Despite the reported benefits of the proposed approach, the achieved RMSE ≈ 1 m still leaves room for improvement. Notably, significantly lower accuracy in the predictions of tree-crowns' expansions were measured in comparison to the predictions of tree growth, with spatial distribution of errors indicating its subjection to sunlight conditions. As typically overestimations occur on the north and west forest edges, underestimations were more characteristic on the south and east sides. While this can be compensated straightforwardly by an asymmetric filter definition that imposes stricter conditions on one side than the other, accordingly, the actual solution to this issues may require introduction of orientation depended regression features, or additional contextual features. As this requires an in-depth analysis of the impacts of individual features on the prediction accuracy through possible feature learning, together with the assessment of the method's sensitivity to the parameter K, it is considered to be beyond the scope of this paper, and will be addressed in our future work. Furthermore, as the behaviour of each individual pixel was modelled with a dedicated weak prediction model, clustering of samples based on their contextual features may significantly speed-up the simulation's learning process. However, its impact on the accuracy should be studied. Finally, while the accuracy of encroaching vegetation detection is expected to improve with higher resolution datasets, an appropriate post-processing of the detected hazards is still required. In order to meet asset management requirements (i.e., data fusion Level 6), this should account for legal restrictions and cost requirements of power line corridor clearance tasks that will enable their optimal grouping, prioritising, and scheduling.