A System for Monitoring the Environment of Historic Places Using Convolutional Neural Network Methodologies

This work aims to contribute to better understanding the use of public street spaces. (1) Background: In this sense, with a multidisciplinary approach, the objective of this work is to propose an experimental and reproducible method on a large scale. (2) Study area: The applied methodology uses artificial intelligence to analyze Google Street View (GSV) images at street level. (3) Method: The purpose is to validate a methodology that allows us to characterize and quantify the use (pedestrians and cars) of some squares in Rome belonging to different historical periods. (4) Results: Through the use of machine vision techniques, typical of artificial intelligence and which use convolutional neural networks, a historical reading of some selected squares is proposed, with the aim of interpreting the dynamics of use and identifying some critical issues in progress. (5) Conclusions: This work validated the usefulness of a method applied to the use of artificial intelligence for the analysis of GSV images at street level.


Introduction
Google Maps and Street View were not developed for scientific research, however they create interesting research possibilities in the urban environment. Google Street View (GSV) was released in 2007 and differs from traditional mapping software by directly capturing the visual aspect at ground level. By blending together images taken from different angles, Street View creates what appears to be a seamless tour of the city streets and can give the feeling of "being there" [1]. GSV's images have been studied in the computer vision community, although they were not created for research.
There are other sources of data to know, study and measure the urban environment. For example, remote sensing with high spatial resolution has been used for the study of street greenery in historic centers [12], for the estimation of the height of buildings and for the extraction of features in the historical urban landscape and for the classification of roofs in the historic center of Rome [13].
However, remote data with high spatial resolution are not always available. Furthermore, the profile view of road landscapes, which people experience and see from eye level, is different from the top view captured in remotely sensed images [14,1]. These differences can be overcome with manual inventories and field surveys; however, the collection of data in situ is laborious, time-consuming and allows for detection errors, especially if carried out by non-experts or volunteers [15]. GSV data, in addition to being GSV's images are not always constant over time and space, therefore the selection of squares was also dictated by the availability and updates available online. The choice to acquire the images in two different periods (pre Covid-19 and full pandemic) is motivated by the need to have a comparative picture that is as truthful as possible.
The images of each square have been sorted by historical period: medieval, Renaissance-Baroque, nineteenth century (Table 1).

Materials and Methods
As per table 1, two significant sample images were collected and processed for each square. The images processed in this work through machine vision techniques are all virtual; it's possible to use the same method even with real images taken in a different time frame [40]; for the analysis of temporal comparison it's important that the framing of the scene is always the same [41,42,43]. The choice of the most suitable network for training in the specific application domain has focused on convolutional neural networks (CNN), which are a regularized version of the multilayer perceptive networks also defined as deep neural networks [27,28,29]. Deep learning is an important part of machine learning, and deep learning algorithms rely on neural networks. Below are the general and CNN-specific features of a machine learning network [33,34].

Neural Networks
Artificial Neural Networks are systems used in the field of artificial intelligence that are vaguely inspired by the biological neural networks of animals [30]. These systems, similarly to biological neural networks, base their operation on the learning ability that allows to perform operations thanks to the pre-programmed knowledge and what they learn in the environment in which they operate [45]; therefore they are generally not programmed in a specific way or in any case with specific rules. In image recognition, in a first phase, an artificial neural network can learn to identify objects by means of manual identification and memorization [44,46]. Later the result of this learning serves to identify specific objects in new images completely different from those used to instruct the system previously.

Architecture of a neural network
The architecture of Artificial Neural Networks is composed of artificial neurons, which try to repeat the concept of a biological neuron that receives different inputs; the inputs are combined with their internal state and with a predetermined optional threshold using a certain activation function; the output is produced using a different function. The network also consists of connections; each connection transmits the output of a neuron as an input to another neuron; is assigned to each link a weight value by the emitting neuron which determines the importance of that link [35,36]. Each neuron can have multiple input and output connections. The propagation function computes the input of a neuron from the output of its predecessor and its connection as a weighted sum. To this can be added a further term bias.

Perceptron
The Percepter is an algorithm for supervised learning of binary classifiers of artificial neural networks [31,32,33]. The binary classifier is a function that can understand when an input, represented by a vector of numbers, belongs to a certain class. It's therefore a linear classifier that makes its predictions based on a linear prediction function by combining a set of weights with a vector of characteristics. Single-layer perceptrons are capable of learning only linearly separable patterns. Perceptrons with a second layer are used for complex classification tasks, which are able to solve many problems otherwise solvable with separations.
The perceptron can be seen as an algorithm that, received as input a vector x of mvalues (x₁, x₂, xₘ), which is the input, returns a value that can be 0 or 1, the following function is defined: ( ) = { 1 ( • + ) > 0 0 ℎ where w is the vector of weights and b is the bias, w is a vector of weights as real values, w • x is the scalar product:

Learning
Learning is a phase that allows a network to better process a type of data, analyzing samples of that same type of data. Learning involves adjusting the weights and thresholds to increase the accuracy of the operation and therefore the result by minimizing the observed error. Learning is considered finished when the review of new analyzes does not considerably reduce the error. Generally, even after the end of the learning phase, the error percentage never reaches 0; however, if this value is too high then the network probably needs to be redesigned. This is done practically by defining a cost function that is periodically evaluated during learning; learning continues as long as the output error continues to decrease significantly. The learning rate defines the size of the steps that the learning model takes to adjust the error in each observation. A high learning rate reduces learning time but reduces accuracy; instead a slow learning speed slows the process down, but could potentially increase accuracy.
Optimizations, through the use of algorithms such as Quickprop, are generally aimed at speeding up error minimization; other improvements that aim to improve re-liability are also taken into consideration. To avoid fluctuations within the network, such as the alternation of connection weights, and to improve the convergence rate, an adaptive learning rate is used that increases or decreases as needed. The concept of momentum allows the balance between the gradient and the previous change; this concept must be weighed in such a way that the weight adjustment depends for some degrees on the previous change.

Convolutional Neural Network
In the Deep Learning discipline, a convolutional neural network (CNN) is a class of deep neural networks and is generally used to analyze images. CNNs are also known as "Space invariant artificial neural networks" (SIANN) due to their architecture based on weight sharing and symmetrical translation characteristics [37,38].
These networks are applied in image recognition, image classification, image analysis in the field of medicine, natural language processing and time series in finance. In a convolutional neural network, each neuron receives inputs from a certain number of positions in the successive layers; in a completely connected layer, as already seen, each neuron receives input from each element of the previous layer, while in a convolutional layer the neurons receive input only from the components of a restricted area of the previous layer. This area is generally square in shape, for example 10x10 pixels and is called the receptive field with respect to the single neuron, referring, as already seen, to the operating logic of the animal visual cortex [34,35].
This receptive field in the case of a completely connected layer corresponds with the entire previous layer, while in the case of a convolutional layer it is a subset of the same. The working area of the original image in the receptive field increases more and more as the system analyzes in depth the following layers of the network, this because the various convolutional layers and therefore the various convolution operations take into consideration the value of the pixel specific, but also the value of some pixels that surround it.
Each neuron in the artificial neural network generates an output by applying a specific function to the inputs arriving from the receptive field of the previous layer. This function is determined by a weight vector and a bias value. Learning then proceeds with the interactive updating of these weights and this bias. The weight vector and the bias are called filters and represent particular characteristics of the input; these filters can be shared by different neurons significantly reducing the amount of memory required for the various computations. This characteristic is typical of convolutional artificial neural networks (CNN). In particular, both for performance and for the adequacy of the networks in the specific job, the YOLO V3 network was chosen, which uses 53-layer Darknet-53 as a feature extractor. The two most important parts of Darknet-53 are Convolutional and ResNet. The 1 × 1 pixel convolution can compress the number of channels of the feature map to reduce model computation and parameters. The 3 × 3 pixels multiple convolution turns out to be nonlinear for a large convolution layer of the filter, making the decision function more conclusive. ResNet can make your network deeper, faster, easier to optimize, with fewer parameters and less complexity than other models; therefore, it can solve the deep network problem regarding degradation and learning difficulties. Darknet-53 performs a total of five dimensionality reductions on operations. The number of rows and columns belong to the matrix of characteristics of each dimension. The result for the reduction becomes half while the depth doubles compared to the previous one. In this work it was decided to divide learning into two categories: • the first category prediction is defined by real images taken in environments with different lighting and different scenarios; • the second category prediction is defined by virtual images taken by GSV. The two categories are used for training the deep learning network; the error between the aforementioned value and the real value is calculated by the loss function (MSE, MAE, Huber Loss) using the backpropagation of the error in the neural network and constantly adjusting the weight of each convolutional layer of the network to complete the training of the model. With this process outlined in Figure 3, which represents the algorithm pipeline, the loss function determines the direction in which the model is trained [46,47].
In the images of the elaborations the percentages of recognition of the objects according to the classified categories are clearly visible see Table 2, in many cases when visibility is good the results are 100% accuracy, however in all cases where they are below 50% accuracy the category prediction is always correct.

Results
The processing results are summarized in Table 3:

The medieval squares
The results of the elaborations with the neural networks are reported in figures 4, 5 and in Table 4.  The two aforementioned squares are located in the Trastevere district, to the right of the Tiber, below the Tiber Island; this area in the Middle Ages had its spaces subordinated to the activities of the port of Ripa Grande and had strategic points for the control of the Cestio and Rotto bridges [48].
The square, used as a place of sociality and collective recognition, in the Middle Ages played an essential role in the urban scenario where people met in a space rich in history and art, sharing daily life in its various aspects (economic, religious and politic); the images of Piazza dei Mercanti and Piazza in Piscinula confirm a change (figures 3,4,5,6,7,8,9,10). The elaborations of Piazza dei Mercanti in 2018 featured 29 parked cars, 2 vans, 1 bicycle, and only the presence of 1 person. The situation is similar in 2020 with 25 parked cars and 2 umbrellas; with an area of about 711 square meters, we can estimate a percentage of about 25% destined for parking spaces for parked cars.
Near Piazza dei Mercanti we find Piazza in Piscinula (11th century). In this area some important medieval buildings mentioned by Sitte (1980) [49] are recognizable in the elaborate images; they are the "closed" circumscribed spaces, a sort of large "excavated" courtyards that isolate the square from the rest of the town [49]. These characteristics also correspond to the concept of the medieval square: a finished place, distinct from the rest of the city [50,51].
The elaborations of Pizza in Piscinula confirm that the parking space represents 25% of the surface of the square (1100 m2); in addition, the almost absence of people respectively 3 in 2018 and 1 in 2020, represent clear signs of the incongruous use of public space.
Both squares, Mercanti and Piscinula, despite having unique architectural characteristics, think of the presence of remains of medieval architecture, in fact they are an open-air car park; a condition obviously unable to encourage outdoor activities and to induce people to stop and entertain during the day [25]. In this regard, Lauria (2017) [52] asserts that the improper use of space represents an easily recognizable manifestation of residuality; however, it is very difficult to define. It can be assessed by each of its inhabitants as "adequate" for some types of activities and "inadequate" for others. This can also happen within the same urban context. It all depends on the personal point of view of the inhabitant, on his contingent or structural needs, on his wealth of experience. "social categories": between tourists and residents, between the elderly and the young, between the rich and the poor, between pedestrians and motorists, and so on. For example, in some cities of art tourism in a hurry constitutes a critical element so strong that it compromises the emotional relationship between public space and the resident community. Having said that, with due caution, it can be said that "inappropriate" activities limit the free use of urban space by the majority of inhabitants, inhibiting the typical functions of public space (walking, meeting, talking, pausing, playing, etc.); however "inappropriate" activities weaken the "social" potential of the public space.
It is also true that deserted spaces, especially in the cities of art most beaten by mass tourism, such as Rome and specifically the medieval district, could, if properly equipped, represent for the visitor a moment of rest, of respite from the incessant rhythm of the city; in the tranquility they could make you appreciate the beauty of the present urban landscape.

The Renaissance squares
The results of the elaborations with the neural networks are reported in figures 6, 7 and in Table 5.  What is surprising in the elaborate images, besides the large size, is the shape of these Squares (Figure 6, 7). In this regard, Sitte (1889) [53]points out that "in the Renaissance and even more so in the Baroque, architects were more concerned with the shape of the Squares than with their functionality", "They refer to ideal theoretical models in which beauty derives from the rational harmony of mathematical laws and the use of perspective". "Many times they are an element of pure urban beautification, rather than an element of utility" (Norberg, 1998).
Piazza Navona is located in what was once the Campo Marzio. A space with an unusual shape, of great potential that fascinated many architects (Schultz 1980;1998) [54,55]. It was the era between the Renaissance and the Baroque that transformed the ancient space into a great scenography of power (the living room of Rome). Two famous architects, Bernini and Borromini, measured their skills in a competition that had the Piazza and its buildings as its theater: church of S. Agnese (1657); Palazzo Doria Panphilj (1644); Fountain of the Rivers (1650); Fountain of Neptune and Fountain del Moro (1655). The end result was a fantastic and engaging outdoor living room, an enclosed stage of power [54].
The classified image shows in all its beauty the elongated shape of Piazza Navona having an area of approximately 12,000 square meters. Today the square is a large pedestrian area with the presence of outdoor activities, bars and restaurants that in our latitudes make it usable all year round. The elaborations limited to only two access points to the square (as a pedestrian area) highlight the presence of people in groups (21 people in 2028 and 5 in 2020), this is certainly an important indicator of urban vitality. Franck and Stevens (2007) [56] observe that the activities that take place in a public space can be substantially reduced to two types: voluntary and induced ones. The first are those determined by people's desires: eg. to rest, read a book, run, meet someone, buy something, etc .; the latter are instead stimulated by space thanks to its characteristics and the opportunities it offers at a given moment: eg. bathe in a fountain on a hot day, dance when you hear music, sit on a step or wall, pick some flowers, etc. In the first case, individuals look for a place suitable for the functions they want to perform; in the second case, however, it is the space that stimulates people to carry out certain actions. When both voluntary and induced activities meet, that "magic" is created which gives the space a unique role, which favors opportunities for encounter, which stimulates the occurrence of spontaneous events, which allows the discovery of the unexpected space where they grow and intertwine sensory and emotional relationships.
To the right of the Tiber river at a distance of 1,800 meters from St. Peter's Square, there is one of the most important elements of the Baroque city: Piazza del Popolo ( Figure  7). The elaboration highlights the complex of the square consisting of two large structures linked together: the gigantic elliptical basin and the Porta del Popolo located to the north (for centuries it was the privileged access to the city from the north).
One cannot help but notice in the elaborations, the vast basalt flooring, equal to an area of 11,000 square meters. The center is characterized by a travertine obelisk dating back to the pharaoh Ramses II and a fountain designed by Giacomo della Porta.
The shape of the square assumes its current shape only at the end of the 19th century by the architect Giuseppe Valadier. If in previous times the square was a place to stop, the baroque squares become places to stroll.
Until 1998 this square was a large parking lot for the city; later it took on new life after being pedestrianized; currently it is occasionally used for large events. Despite the fact that the elaboration highlights the presence of people in the vastness of the space of great historical and architectural value, respectively 26 people in the year 2018 and 13 in the year 2020, given the vastness of the square, a feeling of emptiness is perceived. In this regard,   [58,59] states that the absence of specific functions due to its scarce use does not make public spaces particularly lively; on the contrary, these can be monotonous and boring and not very attractive to walk, meet, talk, stop, play, etc., even in the absence of traffic and parked cars.
The Center for Public Space Research, in its numerous researches on public space, has repeatedly underlined the relationship between life on the street, the number of people and events produced and the time spent outdoors, demonstrating how the decrease in opportunities to spend time in the open air contributes proportionally to the decrease in the vitality of a given area. Furthermore, by bringing together and "mixing" various types of activities in the same spatial context, the moments of stay outdoors can be intensified and, therefore, the social relations between those who use the square.

Squares of the nineteenth century.
The results of the elaborations with the neural networks are reported in figures 8, 9 and in Table 6. Following the technological evolution of transport systems that have forced cities to adapt their public spaces to rail, trams and cars, since the nineteenth century the industrial revolution has led to changes in urban structures to facilitate the mobility of people and goods [60] . The shape, size and furnishings of the twentieth century squares followed the functional needs that were gradually emerging [61]. The scale factor is completely different from that of previous eras: no longer the man's meter, but the vehicle's meter. The squares are no longer felt as a living organism, in the medieval or Renaissance way, but as a large network of nodes that connect the great road itineraries together [62]. In Rome, under the Giolitti government (1903)(1904)(1905)(1906)(1907)(1908)(1909)(1910)(1911)(1912)(1913)(1914)(1915)(1916)(1917)(1918)(1919)(1920)(1921), new districts and new squares were built including Piazza Mazzini, Piazza Cavour, Piazza Risorgimento, Piazzale degli Eroi.
Observing the image of Piazza Mazzini (Fig. 17, 18,19,20), the "swirling" large open and circular space is amazing, with a road ring that runs all around the square and leaves very little space for pedestrians who cross (1 pedestrian in 2018 and 1 in 2020). At its center is a large flowerbed bordered by trees which, with a large fountain called the "garden fountain", is the centerpiece of the neighborhood.
Piazza Mazzini, despite being a place in the center of the city and having valuable architectural features, is a space entirely dedicated to cars (27 in 2018 and 15 in 2020), which effectively "block" the usability of the space. A post office, banks and shops overlook the square, surrounded by parkings; this condition reduces social relations and diminishes the "public" role of the urban space [63] by discouraging the pedestrian mobility of the neighborhood. Cars, stationary and / or moving, represent a powerful obstacle to the usability and perception of public space.
Environmental factors, noise and atmospheric pollution produced by car traffic, influence the accessibility of a square, the usability and pleasantness of the meeting and rest spaces and can encourage or hinder the permanence of the inhabitants, the performance of certain activities , spontaneous aggregation and social cohesion and mutual understanding between people. The meaning of the word Piazza, as we get closer to our days, has been lost, replaced with the word Piazzale, Largo, open space, large space.
Piazzale degli Eroi is an example of all this; it was also conceived in function of the traffic, crossed by thousands of cars every day (Fig. 21, 22, 23, 24).
The image processing shows the almost absolute presence of cars and the almost absence of pedestrians. Almost imperceptible, in the center of the square there is the fountain of the Peschiera Aqueduct; inaugurated in 1949, made of concrete and travertine, this fountain is currently experiencing unstoppable deterioration; it is guarded by seagulls and garbage (bottles, weeds, various waste) that are disfiguring the newly restored work. This part of the city does not offer the ideal environmental conditions to be frequented or used appropriately by the inhabitants. Although the human being has a great ability to adapt, environmental factors, such as, for example, the intensity of traffic, air quality, acoustic quality, visual and light quality, play an important role in the choices when accessing urban spaces and, more generally, in determining the quality of a public space and its livability.
In this context, there are some interesting initiatives promoted by the European Union, Member States and local authorities, aimed at reducing the city's environmental pollution generated by smog and noise produced by cars. In particular, we note the European Union project "European Green Capital Award" which annually awards the city which, through development programs and interventions to protect and safeguard traditions, promote environmental improvement and sustainable development. Among the projects, it is interesting to note the work carried out by the Agència d'Ecologia Urbana de Barcelona which develops sustainability plans and indicators aimed at reducing energy consumption and emissions, improving environmental conditions and the livability of the city. In this project, issues such as mobility, energy, waste, water, biodiversity and social cohesion are taken into consideration. Since the last century, very few squares have been built, but only traffic hinges. The public buildings built (ministries, post office and justice buildings, stations, theaters, etc.), instead of having a vast free space on the front, were lined up along the streets without any difference compared to the residential houses. Many of the pre-existing squares have been used for the functioning of public buildings with the aim of facilitating access, for example. the parking of a maximum number of vehicles without hindering traffic; allowing the possibility of rapid and almost instant displacement for security reasons [62,63] ] (Insolera, 1980;Capasso et al. 2001).
Today new concepts of Piazza have taken over where relationships are no longer physical: virtually we meet, talk, play, only through social networks; at the same time the shopping centers have become the new meeting places, that is the new squares in which physicality are restored to relationships [65] (Augé, 2011).
However, neither social networks nor shopping centers have actually replaced the values of the squares, rather they have substituted some functions.

Conclusion
This work validated the usefulness of a method applied to the use of artificial intelligence for the analysis of GSV images at street level, to characterize and quantify pedestrians and cars in some squares of Rome of different historicity. Furthermore, through the historical reading, an attempt was made to interpret the dynamics of current use, trying to identify some critical issues in progress.
The work, despite some limitations resulting from the small number of selected squares and from the moment of observation (ie: time, day of the week, period of the year) allowed us to deepen the understanding of the use of some squares. In particular, it emerged that, despite the undisputed historical and architectural value, there is not a single element that constitutes a large square. It is certainly important to get rid of motorized traffic and the effects of air and noise pollution. To restore vitality to the squares, the determining factor is what we have called "magic", that is, the harmony of the space enriched by functional varieties. It would be advisable to create aggregation by exploiting the ability of the urban space to be attractive also due to the presence of different functions, at the same or different times of the day (market in the morning, refreshment point and meeting place in the afternoon, a favorite place for young people for evening meetings ). This seemingly simple combination is unfortunately still difficult to create.