Next Article in Journal
Water Heritage in the Rural Hinterland Landscapes of the UNESCO Alto Douro Wine Region, Portugal: A Digital Humanities Approach
Previous Article in Journal
The Vindolanda Vessel: pXRF and Microphotography of an Enamel-Painted Roman Gladiator Glass
 
 
Article
Peer-Review Record

The Classification of Cultural Heritage Buildings in Athens Using Deep Learning Techniques

Heritage 2023, 6(4), 3673-3705; https://doi.org/10.3390/heritage6040195
by Konstantina Siountri 1,2,* and Christos-Nikolaos Anagnostopoulos 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Heritage 2023, 6(4), 3673-3705; https://doi.org/10.3390/heritage6040195
Submission received: 18 February 2023 / Revised: 26 March 2023 / Accepted: 7 April 2023 / Published: 13 April 2023
(This article belongs to the Section Cultural Heritage)

Round 1

Reviewer 1 Report

The paper is an excellent example of the application of DL technologies for studying and classifying historic buildings in urban centres. The research focuses on the historic centre of Athens using YOLO algorithm. The authors, aware of the complexity of the topic, trained the algorithm to reduce errors.

The study is well described, both in its methodology and its application. State of the art on method is very well developed.

State of the art on typological studies of architecture could be improved in the introduction (e.g. in Italy, they are very thorough)

Lines 49-50: the paper will be read by users worldwide who may need to become more familiar with the upcoming Greek bill for abandoned buildings. Explain better.

However, the logical steps leading to the statements in lines 315-316 should be better clarified.

In section 2.1, the leading designers working in Athens in the different periods identified could be mentioned to improve contextualisation.

A reference to future applications would enhance the results obtained in the conclusions. It must also be emphasised that, despite using such large-scale DL applications, the study of historic buildings must be detailed on a case-by-case basis and cannot be generalised. The discipline of architectural restoration warns against numerical standardisations and simplifications of the building characteristics of vernacular and historic architecture because each building and each traditional urban fabric is a unique case.

I congratulate the authors and invite them to continue with further developments. The paper can be accepted after minor revisions.

Author Response

Reviewer 1

 

 

Reviewer’s comment

Authors’ Response

The paper is an excellent example of the application of DL technologies for studying and classifying historic buildings in urban centres. The research focuses on the historic centre of Athens using YOLO algorithm. The authors, aware of the complexity of the topic, trained the algorithm to reduce errors.

The study is well described, both in its methodology and its application. State of the art on method is very well developed.

We thank the reviewer for his/her general comments, and we provide our response below for every point mentioned.

State of the art on typological studies of architecture could be improved in the introduction (e.g. in Italy, they are very thorough)

 

We have added in the introduction the following paragraph:

As far as it concerns the typological studies of architecture that serve to the classification of buildings, many research studies have been carried out that indicate ways how to record and categorize buildings in a multinational level (1), (2) or more specifically in Greece and in Athens (3),(4).

 

1.      J. Summerson. “The Classical Language of Architecture”. The MIT Press. 1992

2.      Balafoutis, T., & Zerefos, S. (2018). A database of architectural details: The case of neoclassical façades elements. In Proceedings of the International Conference—BRAU4, Biennial of Architectural and Urban Restoration, Athens, Greece (pp. 15-30)

3.      Katsibokis, G. (2013). Ktiriothiki: The architectural heritage of Athens, 1830–1950. Journal of Modern Greek Studies, 31(1), 133-149.

4.      Biris, M. Athinaiki Arhitektoniki 1875–1925 [Athenian Architecture 1875–1925]. Athens: Melissa, 2003.

 

Lines 49-50: the paper will be read by users worldwide who may need to become more familiar with the upcoming Greek bill for abandoned buildings. Explain better.

 

We have added in the introduction the following paragraph:

More specifically in the context of the upcoming Greek bill concerning the arrangements for the abandoned and vacant properties as well as the intervention procedures for their restoration and reuse by the private sector, under conditions of legal certainty and with fast procedures, or the European initiative Renovation Wave…

 

The logical steps leading to the statements in lines 315-316 should be better clarified.

 

The whole paragraph was removed according to the comments of Reviewer 3.

In section 2.1, the leading designers working in Athens in the different periods identified could be mentioned to improve contextualisation.

The leading designers were added in Neoclassical architecture, Eclecticism and Interwar architecture. We explain that is not possible for the postwar period.

 

A reference to future applications would enhance the results obtained in the conclusions.

It must also be emphasised that, despite using such large-scale DL applications, the study of historic buildings must be detailed on a case-by-case basis and cannot be generalised. The discipline of architectural restoration warns against numerical standardisations and simplifications of the building characteristics of vernacular and historic architecture because each building and each traditional urban fabric is a unique case.

We have added in the Conclusion:

Considering the above, the proposed method implementing YOLO object detection system contributes to saving time compared to manual traditional techniques of CH classification and gives us the motivation to extend our work in the future using photographs taken by UAVs and expand our study area to more classes and cities. More specifically, with the use of drones the evaluation of an area can be executed in a limited period of time in order to identify the building infrastructure of the building blocks under consideration and to give approximate results that show the trends of adopting specific strategic decisions. This cannot in any case concern final decisions, e.g. demolition, on specific buildings, especially the historical ones, as it is known that each monument is a special case that requires an individual and in-depth examination. The generalization and classification can serve the large-scale supervision, which is deemed necessary for the formulation of policies that will favor holistic urban studies, the finding of incentives and financial tools for conservation and renovation etc. both at the level of the Municipality, as well as at the regional and national level. Nevertheless, the intervention in each individual unit remains, always in accordance with the principles of restoration and the relevant legislation, an issue that requires specialized scientific study.

 

 

Reviewer 2 Report

Dear Colleagues,

 

I hope you are well. Congratulations on all this work that you are presenting in such an extensive way. It is very well explained and structured, but I will leave you with some relevant comments that I consider could be relevant for improving your manuscritp.

I think that a reference to the UNESCO Recommendation in Urban Landscapes is missing, in the same sense that it is not clear at all what can be the appliance of this method in urban conservation and planning. I suggest to read the works of John Pendlenbury and Loes Veldpaus, from the Uni of Newclaste and Ana Pereira Roders from TU/Delf.

Pendlebury, John, Mark Scott, Loes Veldpaus, Wout van der Toorn Vrijthoff, and Declan Redmond. 2020. “After the Crash: The Conservation-Planning Assemblage in an Era of Austerity.” European Planning Studies 28 (4): 672–90. https://doi.org/10.1080/09654313.2019.1629395.

Veldpaus, Loes, and John Pendlebury. 2019. “Heritage as a Vehicle for Development: The Case of Bigg Market, Newcastle upon Tyne.” Planning Practice & Research, July, 1–15. https://doi.org/10.1080/02697459.2019.1637168.

Ginzarly, Pereira Roders, A., & Teller, J. (2019). Mapping historic urban landscape values through social media. Journal of Cultural Heritage36(March-April), 1–11. https://doi.org/10.1016/j.culher.2018.10.002

Othe interesting readings, that also stress the fact of the urban heritage and its transformations can be found in relation with the Deep Cities project (https://curbatheri.niku.no/):

Fouseki, Kalliopi, Torgrim Sneve Guttormsen, and Grete Swensen. 2021. “Heritage and Sustainable Urban Transformations: Deep Cities.” In Heritage and Sustainable Urban Transformations: Deep Cities, edited by K Fouseki, T Guttormsen, and G Swensen, 1–15. London; New York: Routledge.

Palaiologou, Garyfalia, and Kalliopi Fouseki. 2018. “New Perspectives in Urban Heritage – Theory, Policy and Practice.” The Historic Environment: Policy & Practice 9 (3–4): 175–79. https://doi.org/10.1080/17567505.2018.1525949.

Those manuscritps could be helpful to link the method with the currents in urban planning.

On the other hand, in the introduction I think that the section on objectives should be revised, defining what criteria will be used to define objective c) - see lines 111-115.

In line 116 I would say "the structure of the article".

It would be good to indicate in the literature review not only the data of each project or work that is mentioned, but to follow a critical thread of argumentation of what they imply for their own work. This is good benchmarking but it would lack a more critical tone to know what you are taking from each and adding to your knowledge.

In line 1068, I suggest to explain why is relevant the fact that some anciente buildings are concentrated and I would refer that to the HUL approach, mentioned in the references that I suggest to read and include.

In my view, the definition of YOLO should be introduced earlier, in the intro, and must be re-taken in the discussion, not just in the conclussion.

Table 6. I think that the miz of old doors, balconies and windows in the same grid than building may led to confussion, my suggestion in to make 2 tables.

In any case, it is a serious and interesting work whose methodology is well explained. The different uses it can have or how to scale it up to other places, etc., could be discussed, but in general terms, it is very good.

 

Congratulations.

Author Response

Reviewer 2

 

 

Reviewer’s comment

Authors’ Response

I hope you are well. Congratulations on all this work that you are presenting in such an extensive way. It is very well explained and structured, but I will leave you with some relevant comments that I consider could be relevant for improving your manuscript.

 

We thank the reviewer for his/her general comments, and we provide our response below for every point mentioned.

I think that a reference to the UNESCO Recommendation in Urban Landscapes is missing, in the same sense that it is not clear at all what can be the appliance of this method in urban conservation and planning. I suggest to read the works of John Pendlenbury and Loes Veldpaus, from the Uni of Newcastle and Ana Pereira Roders from TU/Delf.

We have added in the introduction the following paragraph:

Finally, the classification of buildings is a key source of knowledge in urban and spatial planning. By the beginning of the 21st century, the study of the protection of the wider historical environment was implemented separately from the development of spatial planning. For this reason, following the Declaration, of Amsterdam (1975) the Council of Europe has tried to promote a more comprehensive approach, with UNESCO Recommendation (2011) in Historic Urban Landscapes (HUL). HUL supports the implementation of a holistic approach that supports social involvement for the promotion of community education to recognize and maintain the diversity of CH that despite the fact it can be transformed over time, helps to maintain the physiognomy of the place (1), (2), (3). In this context, urban development is promoted in terms of improvement in quality of life and sustainability.

 

1.      Pendlebury, John, Mark Scott, Loes Veldpaus, Wout van der Toorn Vrijthoff, and Declan Redmond. 2020. “After the Crash: The Conservation-Planning Assemblage in an Era of Austerity.” European Planning Studies 28 (4): 672–90

2.      Veldpaus, Loes, and John Pendlebury. 2019. “Heritage as a Vehicle for Development: The Case of Bigg Market, Newcastle upon Tyne.” Planning Practice & Research, July, 1–15.

3.      Ginzarly, Pereira Roders, A., & Teller, J. (2019). Mapping historic urban landscape values through social media. Journal of Cultural Heritage36(March-April), 1–11.

 

In the introduction I think that the section on objectives should be revised, defining what criteria will be used to define objective c) - see lines 111-115.

We have added the following:

c) the export of useful conclusions for the further study of cultural heritage e.g. the identification of concentrations of historic buildings, the approximate dating of these concentrations, the extraction of statistics and the formulation of short-term and long-term renovation policies that will ultimately lead to more in-depth studies of the existing condition, restoration and conservation.

 

In line 116 I would say "the structure of the article".

 

We made the change according to the comment.

It would be good to indicate in the literature review not only the data of each project or work that is mentioned, but to follow a critical thread of argumentation of what they imply for their own work. This is good benchmarking, but it would lack a more critical tone to know what you are taking from each and adding to your knowledge.

 

We have reorganized and we processed the chapter of Related Work

 

In line 1068, I suggest explaining why is relevant the fact that some ancient buildings are concentrated and I would refer that to the HUL approach, mentioned in the references that I suggest to read and include.

 

The HUL approach was added.

 

Reference:

Palaiologou, Garyfalia, and Kalliopi Fouseki. 2018. “New Perspectives in Urban Heritage – Theory, Policy and Practice.” The Historic Environment: Policy & Practice 9 (3–4): 175–79.

In my view, the definition of YOLO should be introduced earlier, in the intro, and must be re-taken in the discussion, not just in the conclusion.

 

We moved the section of dee learning and YOLO before the presentation of the Related Work

Table 6. I think that the miz of old doors, balconies and windows in the same grid than building may led to confusion, my suggestion in to make 2 tables.

 

The Table 6 was split two different tables.

The different uses it can have or how to scale it up to other places, etc., could be discussed, but in general terms, it is very good.

We explain more in the section of conclusions:

 

Considering the above, the proposed method implementing YOLO object detection system contributes to saving time compared to manual traditional techniques of CH classification and gives us the motivation to expand our study area to more classes and cities, as the capital of Athens influenced over time the architecture of the entire Greek region. Besides, Neoclassicism, Eclecticism and the Modern movement are pan-European architectural styles. Also, in order to extend our work in the future we are going to use photographs taken by UAVs, as with the use of drones the evaluation of an area can be executed in a limited period of time. In this way we can identify the building infrastructure of the building blocks under consideration and to give approximate results that show the trends of adopting specific strategic decisions.

 

Author Response File: Author Response.docx

Reviewer 3 Report

The submission is interesting and deals with a relevant topic in cultural heritage.

This work has the potential to contribute to research. It helps to introduce new digital methods into DH and architecture.

 

The overall idea of the paper is that buildings are classified into architectural styles based on a foto by using a large CNN for image analysis.

 

However, the work still has some drawbacks. These need to be addressed.

 

It would have been more convincing to use at least two algorithms for comparison for all classification tasks. This should have been either et least a Yolo with another configuration or another deep model (another Yolo version) or a completely different system like a classifier based on HOG features.

 

There are already newer models by now, but since this work contributes to the application of CNNs to a domain outside IT, it is Ok to use Yolo.

 

Section 2.1 and also throughout the paper: how was the assignment of building to a style done? Do experts agree and how much? At one point, there is a relaxation of the assignment due to the results of the classifier. This needs more discussion.

 

Dataset: how are the 5000 images distributed over the different sources? How many came from where? Was Google Street View used?

Using different sources might increase the robustness of the classification. Is it possible to classify once only images from one source to proof that?

Can the data be published?

The issue illustrated in figure 4 and 5 are not elaborated in the table. How is the distribution of the elements over many pictures (table 2)? Why are windows not labelled in the image 4?

 

The authors should also consider to combine the tasks. Building a classifier for 8 outputs including the styles and the parts may lead to better performance overall.

 

Paper organization:

Related work refers to Yolo and CNNs a lot. The paragraphs in the section “Methodology” explains these concepts. It needs to be moved to a location before the Related Work.

 

Related Work appears to be completely randomly ordered. Each paper is viewed only individually. You need to create a flow. Some work uses SIFT, others deep methods. Some work on the building level and some on the part of the façade level. Also compare the sizes of the data sets and the results.

Is the paper on 3D models helpful and necessary for your point?

The short description of Teruggi cannot be understood.

There is lots to do in this section to make it valuable for the reader. We need more analysis instead of a list of abstracts.

 

Table 1: How many parameters does the resulting Yolo have? Maybe these are way too many for 5 classes and 5000 images. As a rule of thumb, there should not be more parameters for training than images.

 

“a total loss” -> explain better why you mention this

The F1 measure in training and test set should be given for all experiments, best solution would be one table.

 

Conclusions: “critics argue” give a reference

“new cognitive area” -> strange, maybe a new scientific cooperation?

“biased datasets” -> explain better. What kind of bias can be part of such a dataset? Bias through the expert labelling, for example?

 

Overall, the paper is well written

 

Further issues:

 

There is a section 1 twice.

 

Convolution neural -> Convolutional neural

 

Often, decimal comma is used. Change to point and avoid point in other numbers. Also in the diagram figures.

 

google street -> Google Street

Author Response

 

Reviewer 3

 

Reviewer’s comment

Authors’ Response

The submission is interesting and deals with a relevant topic in cultural heritage.

This work has the potential to contribute to research. It helps to introduce new digital methods into DH and architecture.

The overall idea of the paper is that buildings are classified into architectural styles based on a foto by using a large CNN for image analysis.

However, the work still has some drawbacks. These need to be addressed.

 

We thank the reviewer for his/her general comments, and we provide our response below for every point mentioned.

It would have been more convincing to use at least two algorithms for comparison for all classification tasks. This should have been either et least a Yolo with another configuration or another deep model (another Yolo version) or a completely different system like a classifier based on HOG features.

There are already newer models by now, but since this work contributes to the application of CNNs to a domain outside IT, it is Ok to use Yolo. 

 

The overall aim of our paper is to investigate how (Deep Learning -DL) methodologies can be introduced in the task of building classification. HOG feature classification is a classical image processing method for high level classification of objects (e.g. identification of a building from a general scene). However, for a intra-class classification, such as the one in our work (i.e. identification of a specific type/kind of buildings) is not the optimum method. Especially, in a complex problem such as the one addressed in this paper, where classification should be performed in higher level, that is in a more qualitative rather than quantitative manner. Either by expert observation or by computer science tools, the identification of a building typology exclusively based from its façade is a complex problem, even for experts.

Moreover, the comparison of different types of DL schemas is not in the scope of this article. In contrast, we choose to explore the abilities of YOLO in various architectures/models and experiments to draw useful and comparable conclusions among them. We are aware that YOLO is not the newest models in DP, but we choose it as individual building classification based on its structural typology as addressed in our paper has not, to our knowledge, been previously reported.

Section 2.1 and also throughout the paper: how was the assignment of building to a style done? Do experts agree and how much? At one point, there is a relaxation of the assignment due to the results of the classifier. This needs more discussion.

 

There are 2 security steps for the classification of buildings at the training level and at the results evaluation level:

1. All images of historic buildings are of structures listed by the Ministry of Culture and the Ministry of Environment and Energy (the “declarations” include brief documentation) or are included in period-specific platforms such as ModMov of ELLET's Interwar research program.

2. Our team consists of people who have specialized studies in architectural heritage, 20 years of experience in the field and work at the Ministry of Culture in the competent department for the protection and management of the cultural heritage of the Athens area (1830 and after).

 

Dataset: how are the 5000 images distributed over the different sources? How many came from where? Was Google Street View used?

 

In page 12 we added table 1, that describes analytically the data distribution from our sources to form the initial dataset. 

Using different sources might increase the robustness of the classification. Is it possible to classify once only images from one source to proof that?

 

We explained better the following :

The first test was carried out with 500 photos of the interwar-eclectic and the interwar building classes, that are mentioned mainly from the ModMov (Modern Movement) database that is dedicated to the examination of the Interwar period [13] and 100% were retrieved from Google Street View. The success was 76,8% (Table 7).

Can the data be published?

 

Since a large part of the dataset are records from official and organizations of the Greek state or private/institutional collections, it is not possible to have the whole dataset fully available to the public. However, it is possible to publish a part of the specific archive that was produced from personal work of the authors, so that the reader can get a first impression of the research data. Therefore, a part of the data can be downloaded from here (link ). This link was added to the article on page 12

In the paper we note that the full data set, can be made available upon request to the authors of the article.

 

 

The issue illustrated in figures 4 and 5 are not elaborated in the table. How is the distribution of the elements over many pictures (table 2)? Why are windows not labelled in the image 4?

The authors should also consider to combine the tasks. Building a classifier for 8 outputs including the styles and the parts may lead to better performance overall.

We have explained the procedure better :

The classification of corbels as independent entities was successful. However, corbels were included in the initial training of balconies (Figure 5) and for that reason we realized that we would have to decide all the elements of the categorization and all the final classes from the beginning of the study. The fact that a) from a parallel study it was apparent that the set of elements that affect the classification of a building in 3 classes (neoclassical, eclectic, interwar) reaches the number 42 and b) that along the way we may have wanted to increase the classes (such as we finally did) led us to the idea of testing YOLO's ability on the whole face of the building being examined.

Also, as the elements to be recognized increase, so does the number of images used as training data for the convolutional neural network. Therefore, the whole procedure was becoming more and more difficult to implement. A large amount of input data entails covering all possible occurrences of the elements of interest.

Finally, as the confidence score was already quite high (>80%), Training of the Phase 1 inspired us to be more demanding and try the classification by using images of the whole of facades and not of separate morphological elements of the buildings (Training Phase 2).

 

 

Paper organization:

Related work refers to Yolo and CNNs a lot. The paragraphs in the section “Methodology” explains these concepts. It needs to be moved to a location before the Related Work.

 

 

The comment is addressed.

 

Related Work appears to be completely randomly ordered. Each paper is viewed only individually. You need to create a flow. Some work uses SIFT, others deep methods. Some work on the building level and some on the part of the façade level.

 

We rearranged the papers in the related work in terms of chronological order (older to more recent ones) and we tried to follow your indications. We also added some brief comments in most of the papers referred in the related work.

Also compare the sizes of the data sets and the results.

It is not in our intention to perform dataset comparisons or to perform a critical review over the reported results and declare a winner methodology. it is evident that the number and quality of testing examples have a direct effect on the overall performance and therefore the results between the reported papers can not be directly compared.

Is the paper on 3D models helpful and necessary for your point?

The short description of Teruggi cannot be understood.

There is lots to do in this section to make it valuable for the reader. We need more analysis instead of a list of abstracts.

 

The reviewer is correct

Table 1: How many parameters does the resulting Yolo have? Maybe these are way too many for 5 classes and 5000 images.

As a rule of thumb, there should not be more parameters for training than images.

 

The parameters of the YOLO architecture is described analytically in Table 8, 10 &12. Moreover, we added an image that portrays the architecture of YOLO, table 5 &6  page 20.

 

However, we do not agree with the rule of thumb as the reviewer propose. This rule of thumb helps for non-deep learning ML techniques. But in DL, there are no specific guidelines for how many samples are needed (or at least it depends on each specific problem). DL networks are routinely trained with far fewer total samples than the number of weights in the network. For instance, the original ImageNet model (Krizhevsky et al,2017) has 60 million parameters, while the ImageNet dataset on which it was trained had about 1.3 million training images.

Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2017-05-24). "ImageNet classification with deep convolutional neural networks" (PDF). Communications of the ACM. 60 (6): 84–90. doi:10.1145/3065386. ISSN 0001-0782

“a total loss” -> explain better why you mention this

The F1 measure in training and test set should be given for all experiments, best solution would be one table

Loss is the loss function that is optimized, and its diagram is usually presented when assessing deep learning techniques.

In any case, in the revised version of the paper, we provide in Tables 8,10 &12, the F1 metrics to better demonstrate the performance of our model.

In addition in page 14, we added a few lines to define the loss function. 

 

new cognitive area” -> strange, maybe a new scientific cooperation?

We agree with the reviewer.

Conclusions: “critics argue” give a reference

 

We have become pore precise :

However, critics argue that analyses for architectural style classification on data extracted by the internet can perpetuate biases and inaccuracies*. For that reason, deep learning algorithms trained on biased datasets can result in incorrect classifications, particularly for styles that are underrepresented in the training data.

* Ginzarly, Pereira Roders, A., & Teller, J. (2019). Mapping historic urban landscape values through social media. Journal of Cultural Heritage36(March-April), 1–11. https://doi.org/10.1016/j.culher.2018.10.002

“Social media provides big data for researchers to perform real-time analytics, as digital ethnographers, on what places and attributes people value in the historic urban landscapes they live or visit, enough to share with their social network. However, the use of these data to further our knowledge on heritage and their values, or to support heritage planning and management is still very limited…

…Results showed that the different analyses complement one another to eventually provide insights into everyday encounters with the historic urban landscape. They also show the difference between experts’ and users’ documentation and characterization languages when defining heritage. When the first apply domain-specific classification models, the latter express personal reflections without following a specific hierarchy or a closed categorical system…”

 

“biased datasets” -> explain better. What kind of bias can be part of such a dataset? Bias through the expert labelling, for example?

 

Yes, we refer to expert label.

Overall, the paper is well written

Further issues

There is a section 1 twice.

Convolution neural -> Convolutional neural

Often, decimal comma is used. Change to point and avoid point in other numbers. Also in the diagram figures.

google street -> Google Street

 

The comments are addressed.

 

\

 

.

 

 

 

Author Response File: Author Response.docx

Back to TopTop