4D Building Reconstruction with Machine Learning and Historical Maps

The increasing importance of three-dimensional (3D) city modelling is linked to these data’s different applications and advantages in many domains. Images and Light Detection and Ranging (LiDAR) data availability are now an evident and unavoidable prerequisite, not always verified for past scenarios. Indeed, historical maps are often the only source of information when dealing with historical scenarios or multi-temporal (4D) digital representations. The paper presents a methodology to derive 4D building models in the level of detail 1 (LoD1), inferring missing height information through machine learning techniques. The aim is to realise 4D LoD1 buildings for geospatial analyses and visualisation, valorising historical data, and urban studies. Several machine learning regression techniques are analysed and employed for deriving missing height data from digitised multi-temporal maps. The implemented method relies on geometric, neighbours, and categorical attributes for height prediction. Derived elevation data are then used for 4D building reconstructions, offering multitemporal versions of the considered urban scenarios. Various evaluation metrics are also presented for tackling the common issue of lack of ground-truth information within historical data.


Introduction
Historical maps are the most powerful source of information for understanding urban phenomena and changes that contributed to defining our cities' actual shape. The growth and transformation of the urban patterns and landscapes can be analysed through these differently accurate, symbolised, and generalised representations of reality. Historical maps represent a graphically coded reduction of the three-dimensional (3D) world in the 2D space, which summarises the urban environment's main features. The cities' growth and changes-conditioned by preferred directions of expansion, natural constraints, and particular historical events-are impressed in these documents with several informative levels. Nevertheless, the 2D space-reduction of the maps entails an unavoidable loss of information, and particularly on the height of the built and natural environment.
With the advent of digital technologies, a more realistic and complete representation of the world has become possible in its three and four dimensions (3D/4D). Several geomatic and modelling techniques have been developed in the last years to generate 3D/4D city models, derived with different levels of automation and input data [1][2][3][4]. 3D/4D models can be digital copies of our cities when enriched with textural information or semantically enhanced when the geometry is linked to other attributes. Depending on their nature, they can be used for visualisation, simulations, geospatial analyses, planning activities, and many other applications [5]. The undisputable advantage offered by the three dimensions is the broader comprehension of the built spaces and relations in the urban pattern, as well as their interaction with the natural elements. Modelling in 3D multi-temporal versions of the same city (4D) can broaden how these relations and interactions are changed over time. Multi-temporal analyses and modelling of urban environments is typically performed using -evaluation of multiple regressors to infer building heights from historical maps; -introduction of the geometric, neighbourhood, and categorical features usable when only digitised building footprints are available; -testing and evaluation of the proposed method on two different locations and multitemporal historical maps; and the realisation of 4D level of detail 1 (LoD1) building models for the geo-visualisation enhancement, creating a 4D cadastre, and spatial analysis purposes (e.g., volumetric density studies.

Related Works
3D city models are simplified digital replicas of the urban environments, mainly defined by the building blocks' geometry, mutual relations, and the interaction with natural elements. 3D city modelling is a vast research area, centred on developing solutions to create models with several techniques, source data, and automation levels [7][8][9]. The application fields have extraordinarily increased in the last decades, bringing significant advantages in many domains (e.g., urban planning, crises and risks management, simulation studies) [10][11][12]. 3D city models are generally obtained from 3D surveyed reality-based data [13,14], employing SAR (Synthetic-Aperture Radar) techniques [15], extruding 2D building footprints [16], through procedural modelling [17], or volunteered geoinformation [18]. Elevation data are mostly derived from LiDAR airborne scanning or image-based procedures and are differently used for modelling purposes. Based on the available source data, employed approach, and the field of application, buildings can be represented with five levels of detail (LoD) [19] and stored with well-defined standards, such as CityGML [20,21] or CityJSON [22]. For visualisation and simple data analyses, the LoD1 (i.e., prismatic block with flat roofs) is sufficient and preferred. According to the available elevation data, the simplest and most common approach for generating LoD1 models is the extrusion of building footprints, generally considering the median or maximum height value [23,24]. When elevation data are missing, a rough building height estimation can be performed with other methods, e.g., counting the number of storeys, considering local regulations and related construction restrictions, or measuring the shadows' length in imagery data [25][26][27][28]. The derived height values' quality is rarely verified and out of scope in many applications with these approaches.
For the 3D reconstruction of historical urban scenarios, buildings' footprints and some neighbours or categorical features are the only information usable for the prediction. In [28] and [46], buildings' heights are inferred exploiting some machine learning techniques, relying on cadastral and statistical data, as well as some geometrical information extracted from the footprints. We hereafter present a similar approach, based only on geometric, neighbour, and categorical features computable from the digitised historical buildings. With respect to other methods, the proposed approach does not include the number of storeys [28] (not always available and which could significantly vary in each regional contexts), and it only relies only on a set of features ("predictors"-Section 4.2) derived from the available building footprints, combined with data augmentation methods to derive accurate results over multiple years and locations.

Trento (Italy) Historic City Centre
As a part of the TOTEM project, maps from four different years (1851, 1887, 1908, and 1936- Figure 1) were employed for this study. Throughout these years, the more significant changes in the city's structure have been related to (i) the alteration of the river's course, which conditioned the urban sprawl of the city and (ii) the progressive demolition of most of the defensive medieval walls. These changes in the urban pattern and the expansion of man-made structures are visible throughout the maps, whereas building functions or planned urban interventions are differently coded.
While in the oldest map (the year 1851), the buildings' aggregation in blocks and their relation with the cultivated areas are quite detailed, the 1887 map features a more draft representation of the urban landscape ( Figure 1), with buildings aggregated in large footprints. In this case, only a few significant civil and religious buildings are sufficiently mapped, while other structures can be identified only by comparing historical data. The informative level of the last two maps (years 1908 and 1936) is instead suitable enough for highlighting the main features and transformation of the built urban environment (Table 1). After the manual digitisation of building footprints and each historical map's georeferencing, several attributes (predictors-Section 4.2) were computed. For training the predictive models, footprints and their respective height values derived from the actual topographic data (2016) were employed (Table 2). From this data set ( Figure 2) some temporally inconsistent constructions, as recent and industrial structures, were removed to avoid erroneous predictions.  (1851, 1887, 1908, and 1936). Please note the different levels of detail of building footprints, e.g., between 1851 and 1887: in the latter case, footprints are bigger and include multiple buildings with regard to the other maps.
After the manual digitisation of building footprints and each historical map's georeferencing, several attributes (predictors-Section 4.2) were computed. For training the predictive models, footprints and their respective height values derived from the actual topographic data (2016) were employed (Table 2). From this data set ( Figure 2), some temporally inconsistent constructions, as recent and industrial structures, were removed to avoid erroneous predictions. As reported in Section 5.1, two buildings' classes proved to be an under-represented category in the training data, namely religious structures and civil towers. A data augmentation approach was therefore applied to reach a more balanced class representation and more accurate reconstructions of these few but relevant buildings, describing the city-skyline and aspect.   As reported in Section 5.1, two buildings' classes proved to be an under-represented category in the training data, namely religious structures and civil towers. A data augmentation approach was therefore applied to reach a more balanced class representation and more accurate reconstructions of these few but relevant buildings, describing the city-skyline and aspect.

Bologna (Italy) Historic City Centre
As a further case study, the historic city centre of Bologna was considered. Two historical maps describing the city in 1884 and 1945 were selected and manually digitised in GIS Environment ( Figure 3). In the older case, medieval defensive walls surrounding the city are still clearly visible, while in the 1945 map, damages of the Second World War and planned building reconstruction interventions are reported. Moreover, in this case, the two historical maps' level of information is quite different. In the oldest representation (1884), few details on the building blocks partitioning are provided with respect to the more recent map (Table 3).

Bologna (Italy) Historic City Centre
As a further case study, the historic city centre of Bologna was considered. Two historical maps describing the city in 1884 and 1945 were selected and manually digitised in GIS Environment ( Figure 3). In the older case, medieval defensive walls surrounding the city are still clearly visible, while in the 1945 map, damages of the Second World War and planned building reconstruction interventions are reported. Moreover, in this case, the two historical maps' level of information is quite different. In the oldest representation (1884), few details on the building blocks partitioning are provided with respect to the more recent map (Table 3). Some characteristics of the actual topographic data ( Figure 4) used for training the regression models are reported in Table 4. Results of building heights prediction are presented in Section 5.3.  Some characteristics of the actual topographic data ( Figure 4) used for training the regression models are reported in Table 4. Results of building heights prediction are presented in Section 5.3.   Some characteristics of the actual topographic data ( Figure 4) used for training the regression models are reported in Table 4. Results of building heights prediction are presented in Section 5.3.

Methodology
This section introduces the tested regression methods for inferring buildings' heights from digitised historical maps (Section 4.1). Predictors (Section 4.2), data augmentation (Section 4.3), and evaluation metrics (Section 4.4) are then presented. Heights values, predicted with some machine and deep learning techniques, are thus used for obtaining multi-temporal 3D models of the case studies in LoD1. The method is based on the construction of regression models ( Figure 5), able to predict the values of a target variable based on some predictor variables.
This section introduces the tested regression methods for inferring buildings' heights from digitised historical maps (Section 4.1). Predictors (Section 4.2), data augmentation (Section 4.3), and evaluation metrics (Section 4.4) are then presented. Heights values, predicted with some machine and deep learning techniques, are thus used for obtaining multi-temporal 3D models of the case studies in LoD1. The method is based on the construction of regression models ( Figure 5), able to predict the values of a target variable based on some predictor variables.

Machine and Deep Learning for Regression Models
Within the machine and deep learning applications, the regression problem is a common task. Regression analysis is a predictive modelling technique that predicts numerical variables y, typically called target, based on one or multiple variables x (predictors) [47]. Thus, a regression model aims to build a mathematical equation that defines the target y as a function of the predictor x. Once the model is trained, it can use new predictor values to infer y. In this work, several and common regression models, mainly available in the scikit-learn library [48], were tested for inferring the building heights from a set of predictor variables (Section 4.2), in particular: (a). Ordinary Least Squares Linear regressor: this model minimises the residual sum of squares between the observed and target variables. It assumes a linear connection between outputs and predictor variables, and it is sensitive to random errors when variables are not independent [49]. (b). Random Forest regressor: it is a supervised learning algorithm based on ensemble learning. Random Forest combines multiple decision trees (reducing the variance and overfitting) resulting in an averaged prediction of the individual classifiers. It

Machine and Deep Learning for Regression Models
Within the machine and deep learning applications, the regression problem is a common task. Regression analysis is a predictive modelling technique that predicts numerical variables y, typically called target, based on one or multiple variables x (predictors) [47]. Thus, a regression model aims to build a mathematical equation that defines the target y as a function of the predictor x. Once the model is trained, it can use new predictor values to infer y. In this work, several and common regression models, mainly available in the scikit-learn library [48], were tested for inferring the building heights from a set of predictor variables (Section 4.2), in particular: (a). Ordinary Least Squares Linear regressor: this model minimises the residual sum of squares between the observed and target variables. It assumes a linear connection between outputs and predictor variables, and it is sensitive to random errors when variables are not independent [49]. (b). Random Forest regressor: it is a supervised learning algorithm based on ensemble learning. Random Forest combines multiple decision trees (reducing the variance and overfitting) resulting in an averaged prediction of the individual classifiers. It also provides straightforward methods for the features' importance analysis and selection [50]. (c). CatBoost regressor: it uses gradient boosting on decision trees. The decision tree is used as a weak base learner, while gradient boosting iteratively fits a sequence of these trees [51]. (d). Support Vector regressor with the Radius Basis Function (RBF) kernel: it produces a model depending only on a subset of the training data. The employed cost function ignores samples whose prediction is close to their target [52]. (e). Multilayer Perceptron regressor: it is a neural network model where neurons are arranged in different layers, connected by differently weighted joints [53]. The model optimises the squared loss using, e.g., stochastic gradient descent.

Predictors
A predictor is a variable used to train a specific learning model to predict something. According to their informative level and reported details, historical maps are the only source for deriving information about building footprints and their spatial distribution in the past. Aside from the building's shape, some blocks' designated use is generally the only additional information obtained from these documents. Data on building heights can be exclusively derived comparing blocks still existing in the same shape, or, approximately, from available photos in the other cases. When data are only derived by historical maps, the selection of the predictors is conditioned by the lack of further cadastral information (e.g., the number of storeys) or aerial views. Therefore, geometric properties and position of the digitised blocks, as well as data from neighbourhood analyses, are the only features usable for the prediction. In this work, three different types of attributes were computed for each digitised building footprint and used as predictors for inferring building heights: -Geometric predictors: (1). Area: defined as the building footprint area; (2).
NPI: the normalised perimeter index, an indicator of the polygon shape complexity. It is computed as the ratio of the perimeter of the equal-area circle and the perimeter of the shape (1): Vertices: the number of vertices of a digitised polygon ( Figure 6a); (5).
Length MBR: the length of the minimum bounding rectangle (MBR) of a footprint; (6).
Width MBR: the width of the minimum bounding rectangle (MBR) of a footprint; (7).
Area MBR: the area of the minimum bounding rectangle (MBR) of a footprint; (8).
Ratio: the ratio between the area of a footprint and the area of the corresponding minimum boundary rectangle (MBR).  (2): where i = 1 . . . ,n, are the input points, pop is the population field of the point i, and dist is the distance between point i and the (x, y) location.
-Positional and categorical predictors: (12). Position (X, Y): the planar position of each polygon centroid within the map; (13). Group: the aggregation of polygons in building blocks (Figure 6d). A "group" value is assigned to each polygon belonging to the same building block, while isolated buildings are grouped; (14). Class: defines a building of specific historical value, such as churches, palaces, castle, and tower; (15). Function: defines the civil or religious function of the buildings. In our cases, civil buildings were also grouped considering their approximative period of construction, derived by comparing multi-temporal historical maps; (16). Towers: includes a shape-based classification of the civil and religious towers, i.e., circular, rectangular, or octagonal shape. In our cases, we noticed that towers with similar shapes featured similar heights.
(13). Group: the aggregation of polygons in building blocks (Figure 6d). A "group" value is assigned to each polygon belonging to the same building block, while isolated buildings are grouped; (14). Class: defines a building of specific historical value, such as churches, palaces, castle, and tower; (15). Function: defines the civil or religious function of the buildings. In our cases, civil buildings were also grouped considering their approximative period of construction, derived by comparing multi-temporal historical maps; (16). Towers: includes a shape-based classification of the civil and religious towers, i.e., circular, rectangular, or octagonal shape. In our cases, we noticed that towers with similar shapes featured similar heights.  The computed predictor values, especially for the geometric and neighbour attributes, are strongly conditioned by the historical maps' level of detail and the consequent digitisation ( Figure 1). Based on the quality of the building footprints (i.e., depicting a single building or a group), attribute values can significantly vary. Their combination with further attributes has been introduced to support the predictions, especially when the digitised maps suffer from a poor representation of man-made structures.

Data Augmentation
Data augmentation is a technique used to increase the quantity of data and is based on generating either altered versions of the existing data or artificial data. It is most commonly used for dealing with overfitting as well as creating more sample data, e.g., in deep learning processes [54][55][56][57]. In data augmentation, sample sets are expanded, generating synthetic data through any geometric or colorimetric transformation [58,59].
When dealing with urban data and modelling applications, some building classes are commonly under-represented. In historical centres, this condition occurs with two main categories of buildings, i.e., religious structures and civil towers, both strongly typifying the shape and skyline of the cities. In similar situations, the learning process could be affected by under-represented classes. Hence, re-sampling classes distribution and randomly adding geometrically modified copies of the weakly represented structures could be necessary to process more balanced datasets and improve the model performances.
In this work, augmented buildings are positioned outside the city without overlapping existing ones. The feature extraction is then handled together with all buildings. The augmentation is based on the existing buildings' data, with modifications on their geometrical properties, including dimensions and orientation. The augmented data is used for only training.

Heights Prediction Metrics
Three different approaches are proposed for evaluating the quality of the inferred building heights.

Evaluation of the RMSE, MAE, and R 2 on Randomly Split Training and Test Data
In machine learning applications, the prediction quality considers how well a model performs on data not used when fitting the model. Therefore, data are commonly split into training and test datasets (typically 70% and 30%, respectively), where the training is used to estimate parameters of the predictive method and the test dataset for evaluating its accuracy. When data augmentation (Section 4.3) is used, those data falling in the test dataset are moved to the training dataset. Most used metrics for evaluating the quality of the models are the root mean square error (RMSE) (3), the mean absolute error (MAE) (4), and R 2 (coefficient of determination) (5). They are respectively defined as: Median of the height differences (ground truth vs. predicted) and standard deviation are also used and reported as evaluation metrics.
A more in-depth control of the machine learning prediction quality can be performed adopting-as a test set-a small subset of still existing buildings having the same shape and being present on different historical maps.

Single-View Metrology from Historical Images
Assessing the quality of the prediction of disappeared buildings is more complicated. If historical images are available, single-view metrology [60,61] and fundamental invariants of projective geometry [62] can be used. An essential property of projective geometry is that some measures are invariant to projective transformations. The cross-ratio invariant and image vanishing points can be used for deriving distance measurements from a single image. Knowing one reference distance (H), the method (Figure 7) implies that the height of the camera (H C ) and any other distance (H U ) between two planes perpendicular to the reference direction (v 3 ) can be derived. Two points B and T, lying on two planes P and P' perpendicular to the reference direction v 3 , are represented in image space by points b and t lying on the two planes defined by the two vanishing points v 1 and v 2 . The image point lies at the intersection of the line joining the corresponding points C (the camera centre) and C' with the vanishing line l v1v2 . The point C lies on a plane at a distance H C from the reference plane P. Under this configuration, the image points b, t, c, and v 3 are aligned along the vertical reference direction, and therefore they define a cross-ratio. The ratio also holds in object space with points B, T, C', and v 3 . Therefore, we can derive (6): where d(a, b) is the Euclidean distance between points a and b, measured in image space. Therefore, if we know a vertical reference height in the scene (e.g., a building, a person), we can derive a building's height and use this information to evaluate a prediction.

Accuracy Aims
The two considered urban scenarios present similar volumetric characteristics, peculiar to the construction techniques of northern Italy. The predominance of buildings with numerous pitched roofs and variable slopes makes these test cases quite tricky. For the variety of roof examples and the peculiarities of such stratified urban contexts, a mean absolute error (MAE) of about 2 m was considered a satisfying accuracy target for this work, in accordance with results presented in the literature [28,46]. This accuracy target considers that higher error values can be expected with a random test set since many ground-truth buildings could be temporally inconsistent or changed over time. More accurate results are contemplated for a subset of buildings, unaltered in the considered time-periods.

Results
In this section, regression model performances and results of the predictions on historical data are presented (Section 5.1). A more comprehensive investigation is reported for the Trento case study (Section 5.2). The second dataset, Bologna, shows the replicability of the proposed method (Section 5.3).

Regressors Evaluation
The actual topographic databases of both case studies (Section 3) were used, with the computed predictors (Section 4.2), to evaluate the performances of the selected regressor methods (Section 4.1). Data were randomly split into training and test sets and metrics computed. Tables 5 and 6 show the performances of the different height predictions over Trento and Bologna, respectively. In our tests, the Random Forest (RF) regressor proved to outperform the other algorithms with respect to all chosen metrics (Section 4.4).
Although RF metrics were quite close to our target accuracy (MAE~2 m), an authentic look at the predicted heights of specific buildings show the presence of gross errors in particular for the building classes with fewer samples, i.e., religious buildings and towers ( Figure 8).
A non-uniform composition of data is evident in terms of distribution, as shown in Figure 9 left and Table 7. Therefore, a data augmentation approach was applied for churches and towers to achieve a more balanced representation of all classes (Table 8, Figure 9 right).  A non-uniform composition of data is evident in terms of distribution, as shown in Figure 9 left and Table 7. Therefore, a data augmentation approach was applied for churches and towers to achieve a more balanced representation of all classes (Table 8, Figure 9 right).     The evaluation of the chosen regressor method was performed again on the new "augmented" dataset: results (Tables 9 and 10) show slight accuracy improvements and still prove that RF performs better than the others.  Aside from the slight improvement of the evaluation metrics, the adopted data augmentation approach's effectiveness is proven by reduced gross errors in the underrepresented classes and the more precise height predictions all over the city (Figure 10).

Inferring Building Heights from a Historical Map-Trento Case Study
The building footprints digitised in the four historical maps (1851, 1887, 1908, and 1936) of Trento and their predictors (Section 4.2) were used to infer building heights using the two outperforming predictor methods: Random Forest and Catboost. The actual topographic database was used to learn heights, although some modern buildings were removed beforehand from the input data to limit gross errors. Evaluation metrics to check predicted heights in the historical maps were computed on a smaller test set, including only some buildings still existing, having the same shape and recognisable in the different maps. Tables 11 and 12 show the height difference error calculated in this case, presenting the Random Forest and Catboost results, which demonstrated to be the best performing algorithms (Section 5.1). Again, the Random Forest confirmed to outperform the other algorithms.

Inferring Building Heights from a Historical Map-Trento Case Study
The building footprints digitised in the four historical maps (1851, 1887, 1908, and 1936) of Trento and their predictors (Section 4.2) were used to infer building heights using the two outperforming predictor methods: Random Forest and Catboost. The actual topographic database was used to learn heights, although some modern buildings were removed beforehand from the input data to limit gross errors. Evaluation metrics to check predicted heights in the historical maps were computed on a smaller test set, including only some buildings still existing, having the same shape and recognisable in the different maps. Tables 11 and 12 show the height difference error calculated in this case, presenting the Random Forest and Catboost results, which demonstrated to be the best performing algorithms (Section 5.1). Again, the Random Forest confirmed to outperform the other algorithms. As a further check of the prediction's quality, single-view metrology (Section 4.4.2) was applied to derive some ground-truth heights for disappeared buildings. Vanishing lines and cross-ratio invariants were used with some historical photos where a known height was measurable ( Figure 11). As a further check of the prediction's quality, single-view metrology (Section 4.4.2) was applied to derive some ground-truth heights for disappeared buildings. Vanishing lines and cross-ratio invariants were used with some historical photos where a known height was measurable ( Figure 11). Table 13 summarises the evaluation results adopting the presented procedure.  Figure 11. Two examples of single-view-metrology applied to historical photos in Trento to determine heights of buildings not present anymore in the actual topographic database.
Visual results of the 4D LoD1 reconstruction of Trento are presented in Figures 12  and 13. An example of texture mapping with a historical photo is shown in Figure 14.  Visual results of the 4D LoD1 reconstruction of Trento are presented in Figures 12 and 13. An example of texture mapping with a historical photo is shown in Figure 14.

Inferring Building Heights from a Historical Map-Bologna Case Study
The implemented method was also tested on two historical versions of the Bologna city centre (1884 and 1945). Among the transformations of the city in this time frame, the almost complete destruction of the medieval defensive walls and the heavy damages of war bombardments are the most relevant.
Starting from the city's actual topographic database and following the regressors evaluation (Section 5.1), Random Forest and Catboost methods were used to predict building heights in the historical maps. The predicted heights evaluation was performed, considering only some unchanged buildings as a test set (Tables 14 and 15). Figure 15 presents some general and detailed views of the 3D buildings reconstruction for the Bologna city centre. Table 14. Metric evaluation of the Random Forest prediction on the two historical datasets of Bologna, considering as a test set ten unaltered buildings digitised in both maps.

Inferring Building Heights from a Historical Map-Bologna Case Study
The implemented method was also tested on two historical versions of the Bologna city centre (1884 and 1945). Among the transformations of the city in this time frame, the almost complete destruction of the medieval defensive walls and the heavy damages of war bombardments are the most relevant.
Starting from the city's actual topographic database and following the regressors evaluation (Section 5.1), Random Forest and Catboost methods were used to predict building heights in the historical maps. The predicted heights evaluation was performed, considering only some unchanged buildings as a test set (Tables 14 and 15). Figure 15 presents some general and detailed views of the 3D buildings reconstruction for the Bologna city centre.

Discussion
The quality performance of several regressors for predicting height values from historical data was presented. Several evaluation approaches were proposed for tackling the issue of a lack of ground-truth information with historical data. The evaluation metrics proved that the Random Forest regressor inferred better building heights with regard to other algorithms.
Comparing the prediction results with the Random Forest algorithm in the historical datasets (Tables 11 and 14), a general worsening of the metrics can be noticed when the polygon's area is much different from the training data (Tables 1 and 3). Therefore, the prediction quality is conditioned by the informative level of the historical maps and size of digitised polygons.
The inferring methodology relies on some twenty predictors (Section 4.2). As typical in a machine learning application, a recursive feature elimination (RFE) approach can be employed to select and reduce the predictors. This technique helps remove irrelevant predictors, selecting only the most relevant ones and speeds up the overall prediction procedure. Figure 16 reports the predictors' importance for the Trento dataset. Some categorical features are more significant than other attributes. Therefore, an RFE approach was applied to optimise the algorithm's performance and reduce the number of predictors ( Figure 17). Results and comparisons for both datasets are presented in Tables 16 and 17.
Aside from no significant changes in the evaluation metrics, a general worsening in the predicted building heights' quality was verified by visually checking the results obtained after an RFE approach. Therefore, for the considered datasets, the RFE approach proved to be ineffective, and the different contributions of all predictors (Section 4.2) were shown to be relevant.

Discussion
The quality performance of several regressors for predicting height values from historical data was presented. Several evaluation approaches were proposed for tackling the issue of a lack of ground-truth information with historical data. The evaluation metrics proved that the Random Forest regressor inferred better building heights with regard to other algorithms.
Comparing the prediction results with the Random Forest algorithm in the historical datasets (Tables 11 and 14), a general worsening of the metrics can be noticed when the polygon's area is much different from the training data (Tables 1 and 3). Therefore, the prediction quality is conditioned by the informative level of the historical maps and size of digitised polygons.
The inferring methodology relies on some twenty predictors (Section 4.2). As typical in a machine learning application, a recursive feature elimination (RFE) approach can be employed to select and reduce the predictors. This technique helps remove irrelevant predictors, selecting only the most relevant ones and speeds up the overall prediction procedure. Figure 16 reports the predictors' importance for the Trento dataset. Some categorical features are more significant than other attributes. Therefore, an RFE approach was applied to optimise the algorithm's performance and reduce the number of predictors ( Figure 17). Results and comparisons for both datasets are presented in Tables 16 and 17.
Aside from no significant changes in the evaluation metrics, a general worsening in the predicted building heights' quality was verified by visually checking the results obtained after an RFE approach. Therefore, for the considered datasets, the RFE approach proved to be ineffective, and the different contributions of all predictors (Section 4.2) were shown to be relevant. Figure 16. Predictors importance of the Random Forest method in the Trento dataset. The most relevant are function (e.g., civil or religious), class (e.g., churches, palaces, castle), and distance (from a polygon's centroid from the nearest centroid). On the y-axis, the features importance score is given, i.e., the normalised average scores indicating the weighted variance decrease in a decision tree.    On the y-axis, the features importance score is given, i.e., the normalised average scores indicating the weighted variance decrease in a decision tree. Figure 16. Predictors importance of the Random Forest method in the Trento dataset. The most relevant are function (e.g., civil or religious), class (e.g., churches, palaces, castle), and distance (from a polygon's centroid from the nearest centroid). On the y-axis, the features importance score is given, i.e., the normalised average scores indicating the weighted variance decrease in a decision tree.    The implemented method, although reliable and feasible, still presents some issues to be tackled: (a). data preparation (i.e., the digitisation of historical maps) is demanding and time-consuming; (b). differences in the input data (i.e., different informative levels among the training data and the historical datasets) can affect the quality of the prediction; (c). the results can be influenced by the accuracy of the georeferenced maps, considering that positional attributes are included among the predictors; and (d). the method was tested on similar urban scenarios and using respective actual data as training. The applicability in different regional contexts and the prediction's quality employing other cities' training data need further investigations.
About the latter issue, some preliminary tests were conducted to verify if, in similar built environments, the procedure can return satisfactory results using different cities' training data. The Random Forest performances' evaluation are presented in Table 18 where a training using Trento's data is used to predict heights of Bologna footprints, and vice-versa (removing the positional attributes). Although quality metrics are relatively consistent (or even better) with the results reported in Tables 5 and 6, a visual check ( Figure 18) highlights a general wrong prediction of civil towers and religious buildings. Furthermore, significant errors can be noticed in sparse and isolated buildings outside the city centre, due to the implemented data augmentation technique (Section 4.3). Without positional attributes, the presence of an increased number of sparse polygons (towers and religious buildings) in the training data, proved to negatively condition the learning method. The implemented method, although reliable and feasible, still presents some issues to be tackled: (a). data preparation (i.e., the digitisation of historical maps) is demanding and timeconsuming; (b). differences in the input data (i.e., different informative levels among the training data and the historical datasets) can affect the quality of the prediction; (c). the results can be influenced by the accuracy of the georeferenced maps, considering that positional attributes are included among the predictors; and (d). the method was tested on similar urban scenarios and using respective actual data as training. The applicability in different regional contexts and the prediction's quality employing other cities' training data need further investigations.
About the latter issue, some preliminary tests were conducted to verify if, in similar built environments, the procedure can return satisfactory results using different cities' training data. The Random Forest performances' evaluation are presented in Table 18 where a training using Trento's data is used to predict heights of Bologna footprints, and vice-versa (removing the positional attributes). Although quality metrics are relatively consistent (or even better) with the results reported in Tables 5 and 6, a visual check ( Figure 18) highlights a general wrong prediction of civil towers and religious buildings. Furthermore, significant errors can be noticed in sparse and isolated buildings outside the city centre, due to the implemented data augmentation technique (Section 4.3). Without positional attributes, the presence of an increased number of sparse polygons (towers and religious buildings) in the training data, proved to negatively condition the learning method.   The quality of the prediction was further verified, removing augmented data with inverted trainings. Metrics and some visual results are presented in Table 19 and Figure 19. The quality of the prediction was further verified, removing augmented data with inverted trainings. Metrics and some visual results are presented in Table 19 and Figure  19.  The accuracy results show an opposite trend in the two datasets. Visual outcomes highlight an acceptable prediction for polygons inserted in building blocks and a slight improvement for isolated constructions. These preliminary experiments are promising for the generalisation of the method and its applicability in several contexts.

Conclusions
Multi-temporal (4D) versions of the same urban area are of beneficial interest for landscape and urban analyses. This work presented a methodology for the digital reconstruction of buildings in 4D with machine learning algorithms and historical data.
From digitised historical maps and information of the actual city situations, different regression algorithms were compared and employed for inferring missing building heights, using this information for the multi-temporal 3D reconstruction of two urban city centres.
The reliability of the proposed approach was verified, testing the method on different datasets and epochs. Multiple quality evaluation methods were also proposed to tackle the issue of missing ground-truth data. The achieved results proved to be consistent with our accuracy targets and the complexity of such historical urban contexts. The implemented method is flexible and extendable, relying mainly on geometric and neighbour characteristics derivable from the datasets and adaptable categorical data.
In future investigations, the method will be extended to tackle the several issues presented in the previous section, exploring: Figure 19. 3D view of the inferred building heights (pink) in Trento with respect to the ground truth data (white), using the actual Bologna dataset as training. Data augmentation was removed from the training set.
The accuracy results show an opposite trend in the two datasets. Visual outcomes highlight an acceptable prediction for polygons inserted in building blocks and a slight improvement for isolated constructions. These preliminary experiments are promising for the generalisation of the method and its applicability in several contexts.

Conclusions
Multi-temporal (4D) versions of the same urban area are of beneficial interest for landscape and urban analyses. This work presented a methodology for the digital reconstruction of buildings in 4D with machine learning algorithms and historical data.
From digitised historical maps and information of the actual city situations, different regression algorithms were compared and employed for inferring missing building heights, using this information for the multi-temporal 3D reconstruction of two urban city centres.
The reliability of the proposed approach was verified, testing the method on different datasets and epochs. Multiple quality evaluation methods were also proposed to tackle the issue of missing ground-truth data. The achieved results proved to be consistent with our accuracy targets and the complexity of such historical urban contexts. The implemented method is flexible and extendable, relying mainly on geometric and neighbour characteristics derivable from the datasets and adaptable categorical data.
In future investigations, the method will be extended to tackle the several issues presented in the previous section, exploring: (a). automatic methods and deep learning techniques for replacing the time-consuming digitisation procedure of historical maps; (b). the use of specific training data (e.g., prepared at a building-block level rather than using detailed cadastral maps) for historical datasets suffering from a low informative level; (c). the prediction response assigning a lower weight to positional attributes, for avoiding possible mismatches related to different-scale maps and georeferencing issues; and (d). the possible generalisation of the method, expanding the training set with data representative of different regional contexts, and applying the trained model in actual scenarios where no elevation data are available (e.g., remote areas).

Author Contributions:
The article presents a research contribution that involved authors in equal measure. F.R. supervised the overall work and reviewed the entire paper, writing the introduction, aim, and conclusions of the paper; E.M.F. dealt with the state of the art, methodology, data processing, and results. E.Ö. was mainly involved in the development of the regressor analysis, methodology development, and paper revision. All authors have read and agreed to the published version of the manuscript.

Funding:
The presented work belongs to the TOTEM project activities which are financially supported by Fondazione CARITRO (https://www.fondazionecaritro.it/).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.