2.2.1. Inventory Data from Forest Documentation
Inventory data are obtained during periodic field measurements carried out in forest districts as part of the forest management plan review and updated annually. During the inventory, the condition of forest stands is updated. One of the parameters updated is the height of forest stands. Inventory heights are determined on the basis of field measurements, usually taken with a hypsometer. According to the Forest Management Manual in Poland [
43], the average height is determined on the basis of measurements of approximately 10 representative trees, separately for each species in the stand. The height value is rounded to 1 m. Inventory data are evidenced in the State Forest Information System (SFIS).
Since 2010, all the forests managed by the State Forests of Poland have had their geospatial representation recorded in the form of a Forest Digital Map (FDM). An FDM contains a spatial representation of all objects in the forest district in a coordinate system, linked to a database. Each forest district is obliged to update the map annually in terms of economic events and ownership title. The updates are recorded in the State Forest Information System (SFIS). The data, updated annually by forest districts (collected in FDM and SFIS), are published by the Data for Reports on Forest Condition (DRFC) and are available for downloading in form of the ESRI shapefile files, WMS and WFS services, and attached to the national geoportal. The only data made available are layers of ranges of forest districts, forest ranges, forest divisions, and forest units. Vector layers are accompanied by a table data package which includes identification data of the forest unit, natural and habitat data (such as habitat type, tree stand type, forms of nature protection), and stand assessment data (such as species composition, age of main species, site classification, stand density, stand height, growth). The WFS service only publishes current data. Historic data (however not older than before 2015) may be downloaded only from the DRFC website.
This study is based on two out of many thematic layers that are available in the FDM, whose scope of data is related to the analysed issue: the layer of forest units and the layer of non-exclusive areas (NEAs). Forest units are the basic unit of area division of land under the administration of a forest district, defined in terms of their boundaries and assigned a unique forestry address in the country. For forested forest areas, they are separated based on the occurrence of differences in certain properties of forest stands, including the dominant species in the forest stand and the need to establish appropriate economic guidelines for a given part of the forest stand, ensuring the appropriate accuracy of taking inventory of wood resources [
44,
45,
46].
Adjacent forest compartments may have the same dominant species, but may differ in age, vertical structure, ways of forming the forest stand, the origin of the forest stand, the share of co-dominant and admixture species, the dominant type of density, the class of damage to the forest stand, forest site type of the forest, and the site quality class of the forest stand, understood as a difference in height up to 5 m. In general, all of the above factors should be consistent within a single forest unit. For example, pine is the dominant species in a significant part of the study area. However, despite the uniformity of the dominant species, the area is divided into units due to differences in other (above-mentioned) characteristics. If a fragment of forest that stands out from its surroundings is too small to be treated as a separate unit, it is classified as a non-exclusion area (NEA). Non-exclusive areas are small areas occurring in the forest stand whose properties are different than those of the surrounding forest stand, yet they do not meet the criteria to be considered separate forest units. They include gaps or open spaces devoid of trees, renewed and unrenewed sites of a surface area up to 50 acres, created as a result of polycyclic harvesting, and clusters of trees that differ from their surroundings in terms of species or age which qualify as separate forest units but do not meet the surface criterion.
In order to conduct the research presented here, data for the year 2022 (forest unit and forest stand height layers) were collected from the DRFC. The forest unit layer is distributed with tabular data on the properties of species in forest stands (including information about height). Data for 2013 are not available from the DRFC; therefore, the forest unit layer for that year was obtained from the archives of the Przymuszewo Forest District. As the data on the layers do not contain information about the height of the species, tabular data about height were attached (collected from the SFIS database in the form of text files). The non-exclusive area (NEA) layer for both periods was obtained from the archives of the forest district, as data on this layer are not distributed by the DRFC. This metric was used to filter out the emerging height points of tree crowns situated outside the main area. This allowed us to eliminate the influence of trees that do not belong to the core of the forest stand on its height.
Analysed years belong to different revisions of the forest management plan and have different sources of data (DRFC or archive data of forest district), which is why there are discrepancies in the data structure. We took this discrepancy into account in our experiment. In the first stage of preparing the inventory data, we adjusted the data to have the same structure so that further work for both dates could proceed in the same way (data integration, filtration, and buffering). The process of preparing forestry data is presented in 
Figure 3.
Firstly, all the necessary data from both years had to be merged. The layers of forest units and the NEAs are necessary for the analysis. As such, geometric objects that define the borders between forest stands that differ in terms of survey properties should be accompanied by information about the height of the forest stand. For newer data (from 2022), the information about height is integrated with the borders of forest units and published by the DRFC. However, as historic data require the appropriate data structure, the vector data (forest unit layer) were accompanied by a table with the heights of forest stands. Moreover, the attributes of the forest unit layer from the year 2013 are different from those of the files currently published by the DRFC. In order to maintain consistency with newer data, an additional table was attached, containing the description of the dominant species of the forest unit, type of surface, surface area, forest site type, etc. A table with forest addresses was also added to the forest unit layer to enable unambiguous identification of the location of forest units, which, in turn, allowed us to compare the increases in height based on forestry data and LiDAR data.
Analysis of large amounts of data requires setting certain objectives that will enable us to reduce the resulting amount of data, i.e., to remove data that are redundant from the point of view of the subject of the research. First of all, all layers with their attributes were limited to the area of interest (the Zbrzyca forest range). Apart from that, it was necessary to narrow the scope of data in the research area. Excess data (such as the name of the forest stand layer, the number of species in the order of presence, the thickness of large timber in the forest unit) were removed. These other parameters of tree stands are important in forestry and appear in the attribute table of the forest unit layer, but they are not useful for us in this study (in this experiment, we focus only on the analysis of height increments in the forest units). The timber volume and forest stand growth tables [
47] used in forestry show the thickness of large timber (i.e., the volume of the tree stand and its increase) starting from the age of 20 years and a height of 6 m. As such, for the purposes of this study, points (trees) lower than or equal to 6 m were rejected [
8].
Excess attributes that were irrelevant for the conducted analysis were removed from the forest unit and NEA layers. An index of the year of origin was added to the other attributes in order to distinguish between them in further analyses. Data were filtered to remove layers other than the forest stand and the floor, so as to leave only the main height of the tallest layer of the forest. Then, all species apart from the dominant species were filtered out. The final stage involved creating a buffer around the NEA layers. Due to the origin of this layer (GNSS measurements taken by field workers of the SF with tourist class receivers), significant discrepancies between the position, size, and shape of the objects and the actual objects may be observed. In order to eliminate error, layer objects were enlarged by a 2 m wide buffer (half of the width of the average crown of a mature European Scots pine). Data preparation resulted in creating historic and current vector layers of forest units and non-exclusive areas with the same data structure. These layers only contained the area of interest and the data that were essential for further analysis. Both resulting forest unit layers contained information about the height of forest stands.
In order to calculate the increase in height, it was necessary to merge the data from the analysed years. Due to the changes in the spatial division (resulting from different revisions of the forest management plan) that consisted of dividing forest units and the resulting changes in the addresses of forest units, the data had to be merged from a spatial perspective (not based on attributes, according to the forest address). Pursuant to the principles of creating forest units, they may be merged, provided that the difference in age does not exceed the values specified for the given range. As a result, in 2022, some of the forest units that were separated from neighbouring areas in 2013 constituted single forest units, with the age averaged for the whole forest stand. The borders between forest units also changed between the two revisions as a result of economic events (logging) or due to other reasons (separating conservation areas, e.g., nest protection areas, adjustments in the geometry of land plots in the land register). Considering the above, 236 forest units were selected (out of 458 in 2022 and 451 in 2013) based on the analysis of the product of forest units from the years 2013 and 2022. For these forest units, the increase in height based on SFIS data and those calculated from LiDAR data were then compared. Later, the multiplication of polygon layers with attached data on the height and number of trees from the years 2013 and 2022 was performed. Due to the geometric differences in the contours of objects, the emerging layer contained over five times more objects than the input layers, many of which had a negligible surface area. In order to distinguish the objects that contained the main area of overlap of forest units from the analysed years, spatial selection of the intersection with the scoring layer of the poles of inaccessibility of the selected separations was performed.
Data analysis consisted of comparing the ages of forest stands. The boundaries of the cuttings did not significantly change their shape. However, between the performed LiDAR scans (9 years), the forest stand may have been cut, as can be seen by the age attribute of the main species. This may be noticed as a result of using the attribute of age of the dominant species. Forest units that had a negative age difference between 2022 and 2013 were removed from the selected data. Forest units that were devoid of trees in 2013 (cuts) were also rejected. Therefore, 218 forest units remained to be analysed. The analysed forest units did not include those that had been classified as forest units with miscalculated height in the previous analyses. The forest units selected for analysis are presented in 
Figure 4.
  2.2.2. ALS Data
LiDAR (Light Detection and Ranging) measurement data from airborne laser scanning (ALS) are a representation of the terrain in the form of a cloud of measurement points of the specific coordinates XYZ. The files are saved in LAS format, and apart from point coordinates, they contain, among others, information about the class of a given point and the intensity of signal reflection. Points may also be assigned RGB values (corresponding to red, green, and blue) obtained from aerial photos. The LAS file format, developed by the ASPRS (American Society for Photogrammetry and Remote Sensing), has become the industry standard for the storage of data from airborne laser scanning.
The research was conducted based on LiDAR point clouds with an average density of 4 pts/m
2 and an accuracy of 0.15–0.25 m. Each point in the LiDAR data cloud saved in LAS format has the following attributes: coordinates (X, Y, Z), intensity, Number of Returns, Classification, Scan Angle Rank, GPS Time, and RGB/RGB + NIR data (optional). 
Figure 5 presents the point cloud from airborne laser scanning in natural colours and coloured according to increasing return intensity. The information on return intensity is particularly useful for the classification of points in the cloud.
In LAS files, classes defined by the ASPRS Standard are distinguished in the form of classification codes. These are numerical codes assigned to a point, specifying that a point belongs to a defined class of objects (e.g., soil, low vegetation, building, or water), as shown in 
Table 1. Point classes allow us to distinguish various types of objects situated above ground and the ground itself. 
Figure 6 presents the data from airborne laser scanning for the area of interest, where the subsequent colours represent the corresponding classes, as presented in 
Table 1.
Historic (7 March 2013) and current (26 April 2022) LiDAR data were recorded in the winter season, i.e., after the end of growth increases from the previous year and before the start of growth in the current year. The data were obtained in the form of several charts of the map that covers the area of the Zbrzyca forest range (
Figure 7). Each chart corresponds to a separate LAS file. Later, all files were combined to form consistent sets of historic and current LiDAR data for the whole analysed area.
LiDAR data may be useful, provided that they have been prepared correctly. The illustration below (
Figure 8) presents a flow chart of the methodology of LiDAR data processing in order to determine increases in the height of forest stands. When capturing laser data, apart from the object of interest, reflection from other objects and noise that interfere with the point cloud are recorded. Such interference hinders data analysis, making it less accurate as a result. In order to eliminate such errors, filtration of the point cloud was performed. Unclassified points, noise, water, and coverage doubts were rejected.
The next stage consisted of converting the point cloud to raster form. Two models were developed based on the point cloud: DSM (Digital Surface Model) and DTM (Digital Terrain Model). Both models were calculated with a resolution of 0.5 m, with the use of Delaunay triangulation and linear interpolation during the creation of a grid of triangles from Equations (1) and (2). The DSM was calculated based on the first returns (from tree crowns), while the DTM was based on the farthest returns (from the ground).
          where 
 and 
 is the height of the point in the LiDAR cloud at the point of the coordinates (
i,
j), respectively, for the first and last return; 
TIN is the algorithm for the interpolation of the grid of triangles (Delaunay triangulation); 
r is the DSM and DTM spatial resolution. For the purposes of this study, 
r = 0.5 m.
Then, the CHM (Canopy Height Model) was determined from Equation (3). The CHM to is a differential raster of the DSM and the DTM, with a resolution and range identical to those of the models.
To minimise the influence of outliers on the model, Gaussian filtration was applied with the smoothening filter, marked as F, based on Equation (4):
          where 
 is the result of Gaussian filtration for point coordinates (i,j), while (m,n) are the coordinates of kernel F (Equation (5)):
Excess data should be removed from the smoothened Canopy Height Model (CHM), similar to the removal of data from forestry documentation resources. This is why objects (trees) lower than 7 m were cut out of the model and replaced with NA values (Equation (6)):
The last step in LiDAR data processing was the detection of tree tops and exporting them as points to the vector file. In order to detect tree tops, the f(x) function was defined that regulates the size of the area of the raster, in which local maximum heights are sought—interpreted as the tree tops (Equation (7)).
In order to determine the directional derivative and the free expression of the function, the available data sources on the relation between the height of a tree to crown size were analysed. 
Figure 9 shows the relationship between crown size and tree height, along with the trend line.
The stages of data processing resulted in vector files in shapefile format that contain the point geometries of the detected tree tops with the attributes of identifier ID, tree top height, and the radius of search for the local maximum height. These vectors contain excess points that are located outside the forests managed by the Przymuszewo Forest District, Zbrzyca forest range. However, they lack an identifier that would locate the given tree top in the forest unit. As such, the tree top vector was filtered to remove irrelevant information. After removing excess points located outside the land of Zbrzyca forest range, points situated in forest units but not forest stands (marshes, woodlots in fields, pastures, and meadows, etc.) were also rejected, as well as points located in non-exclusion areas. Additionally, the forest address was added to identify the precise location of the forest stand and link it to a specific forest unit.
In order to compare the results of LiDAR data processing to the work of a surveyor (who selects several representative trees for measurement, and therefore, in general, rejects trees that are significantly lower or higher than average), a weighted average was calculated, taking into account the number of occurrences of a given height in the specific forest unit. Firstly, the height was rounded up to full meters (according to the accuracy of measurements taken with the use of other methods and to the data that are used in the tables of thickness, volume, and growth). This enabled us to count the number of occurrences of each height in a given forest unit. The weighted average for each forest unit was calculated after the data were merged with the forest unit layer and the forest address key. This enabled us to avoid performing the calculations for several hundred thousand points. The weighted average was calculated from the general formula shown below (8):
          where [h1, h2, … hn] is the set of tree heights in the forest unit; [w1, w2, … wn] is the set of weights of the occurrences of a given height in the forest unit that determines the number of occurrences of height 
 in forest unit A. The weight 
is calculated as follows (9):
 is the weight corresponding to the number of occurrences of a given height in forest unit A.
This stage results in a vector layer, in which the points indicate individual tree tops. Then, for each forest unit, the weighted average height, which corresponds to the height of the forest stand in the forest unit, was calculated.