Image Processing Techniques for Concrete Crack Detection: A Scientometrics Literature Review

: Cracks in concrete surfaces are one of the most prominent causes of the degradation of concrete structures such as bridges, roads, buildings, etc. Hence, it is very crucial to detect cracks at an early stage to inspect the structural health of the concrete structure. To solve the drawbacks of manual inspection, Image Processing Techniques (IPTs), especially those based on Deep Learning (DL) methods, have been investigated for the past few years. Due to the groundbreaking development of this ﬁeld, researchers have devoted their endeavors to detecting cracks using DL-based IPTs and as a result, the techniques have given answers to many challenging problems. However, to the best of our knowledge, a state-of-the-art systematic review paper is lacking in this ﬁeld that would present a scientometric analysis as well as a critical survey of the existing works to document the research trends and summarize the prominent IPTs for detecting cracks in concrete structures. Therefore, this article comes forward to spur researchers with a systematic review of the relevant literature, which will present both scientometric and critical analysis of the papers published in this research area. The scientometric data that are brought out from the articles are analyzed and visualized by using VOSviewer and CiteSpace text mining tools in terms of some parameters. Furthermore, this article elucidates research from all over the world by highlighting and critically analyzing the incarnated essence of some of the most inﬂuential papers. Moreover, this research raises some common questions as well as extracts answers from the analyzed papers to highlight various features of the utilized methods.


Introduction
A crack in a concrete surface (e.g., bridge, road, wall) is a very narrow gap between two sides of the surface that generally appears when the surface is slightly damaged. Cracking in concrete surfaces is quite inevitable and concrete surfaces can be cracked due to various reasons, such as deformation of the concrete structures, reaction of salts contained in the earth with concrete surfaces, thermal shrinkage of the concrete structures, overloading in the concrete surfaces, and so on. Concrete infrastructure, especially in South Korea, is quite likely to be cracked, as the percentage of ancient (more than 30 years old) reinforced concrete structures was inferred to be about 3.8% in 2014, and this is predicted to jump up to 13.8% and 33.7% in 2024 and 2029, respectively [1]. These cracks can cause deadly accidents as well as the expenditure of a huge amount of money for the maintenance and repair of concrete structures. So, crack detection at an early stage is very essential; this includes inspecting as well as evaluating the structural health and serviceability of the concrete structures. For many years, manual inspection was a very common and traditional method for detecting cracks in concrete structures. However, manual inspection lacks both efficiency and accuracy. Moreover, this technique is so time-consuming, more arduous, and expensive because, in this method, the inspectors detect the cracks with only their human vision by roaming along the concrete structures. Therefore, realizing the drawbacks of manual inspection and the advancement of automation technologies, Ho et al. in 1990 [2] introduced the usage of image-based methods for detecting cracks automatically in concrete structures. Due to the advantages of vision-based algorithms over manual inspection techniques, the algorithms have gained vast popularity among both engineers and researchers in recent years. Hence, we see that nowadays, researchers from all over the world are devoting their efforts to developing and utilizing vision-based automated crack detection algorithms.
The primary steps for detecting cracks include acquiring the images, image preprocessing, and finally detecting or classifying the images. The literature shows that different types of images, such as camera images [3], Infrared Ray (IR) images [4], Ground Penetrating Radar (GPR) images [5], ultrasonic images [6], etc., are being utilized for detecting cracks. To extract necessary features from the acquired images as well as to remove noise due to shadows, poor illumination conditions, and thin cracks, researchers are developing and utilizing different IPTs such as wavelet transformation [7], Digital Image Correlation [8], Percolation methods [9], Ostu's method [10], Morphological approach [11], Canny edge detector [12], Sobel operator [13], Hough Transformation methods [14], and so on. After extracting the features, it is essential to detect and classify the cracks by using different classifier algorithms. For further improvement in crack detection, researchers nowadays are more willing to use Machine Learning (ML)-and Deep Learning (DL)-based classifier algorithms such as Support Vector Machine (SVM) [15], Random Forest [16], Convolutional Neural Networks (CNNs) [17], Recurrent Neural Networks (RNNs) [18], etc., as Neural Networks can extract necessary features automatically from concrete images and detect cracks more accurately.
With the developments of these image processing and classifier algorithms, visionbased crack detection methods are becoming more popular than ever before. As a result, a few technical articles have already been published in this research field. However, the field still lacks a reasonable amount of relevant systematic review papers presenting a scientometric analysis as well as a critical analysis of the existing works to show the research trends and summarize the prominent IPTs and classifier algorithms for detecting cracks in concrete structures. This gap in the existing literature and huge research scope motivate us to present a systematic review by analyzing the notable papers published between the years 2010 and 2020 which would focus on image-based crack detection algorithms to facilitate new researchers with useful information about this research field. In fact, Deep Learning (DL) started gaining popularity starting in 2012 with the advancement of the AlexNet model and consequently, researchers thought of utilizing DL for crack detection after that time. This one decade (2010-2020) has been specifically taken into consideration for this work because that time period sets the basis for work for this area. We have opted not to include the years 2021 and 2022, as that would be beyond our research objective (which is to analyze the very first decade of this particular research domain).
The main contributions of this survey paper are as follows: • It presents a scientometric analysis of a few selected papers on image-based crack detection algorithms using data mining techniques to find out the current research trends, important research terms, influential publications, journals, and collaboration patterns of this research field. • It presents a critical analysis of the papers related to image-based crack detection methods. • Finally, it provides a summary of prominent image processing techniques and classifier algorithms for detecting cracks.

Literature Review
Computer vision, or image processing-based technology, has revealed itself to be a prominent research field for crack detection over the last decade. As a result, nowadays, it has become a great contributor to automating the crack detection process. Researchers from all over the world are devoting their efforts to developing and improving image-based crack detection methods. As a whole, now these methods have become an engrossing research interest both for researchers and engineers. Along with the continuous endeavors of improving the algorithms, researchers also enlisted the existing methodologies in the theme of survey papers to accelerate the research work in this area. This section briefly summarizes a few primary aspects from the preceding review papers and discusses the prominence of modern articles that establish themselves as some remarkable inclusions to the research field.
The earliest review paper that this work analyzes in this section was authored by John et al. in 1994 [19]. The authors discussed the usage of ultrasonic imaging techniques for detecting cracks in concrete structures. They also highlighted that severe improvement is needed in ultrasonic imaging techniques. After that, McCan et al. and Jahanshahi et al. presented a deep analysis of nondestructive testing (NDT) methods and image processing-based technologies like wavelet transform, Haar transforms, and the Digital Image co-relation technique for detecting cracks in [20] and [21], respectively. In 2014, Yao et al. [22] provided an overview of crack types and sources of cracks. In addition to this, the authors categorized the crack detection approaches into direct sensing and indirect sensing approaches. At a later time, works like [20,[23][24][25] were published in 2016. The authors discussed various computer vision methods for detecting cracks and presented several platforms for image acquisition. In [24], Mohan et al. remarked that researchers would be more willing to use camera images for detecting cracks. Another notable survey was carried out by Gopalakrishnan et al. [26] in 2018, where the researchers gave a review of recently published articles (at that time) that used Deep Convolutional Neural Networks (DCNNs) for pavement crack detection. The authors also discussed and compared existing DL frameworks and network architectures for detecting cracks.
Vijayan et al. in [19] provided an overview of a few DL algorithms along with other processing techniques and suggested DL algorithms as the most preferable methods by analyzing previous works. In 2020, Sharma et al. [27] highlighted crack propagation over time and the depth and severity of cracks, which need to be determined. The authors also mentioned that there is still a huge research scope for developing a crack detection technique that is fast and accurate at the same time. In another paper of 2020, Hsieh et al. [28] presented ML and DL algorithms and available public datasets for crack segmentation in pavement images. The authors determined that Fully Connected Networks (FCNs) and U-Net produce an improved performance in the case of crack segmentation. Table 1 recapitulates the survey papers published on image-based crack detection algorithms. This table presents the publication year, source, major contributions, and limitations of the papers. The table is ordered based on the publication year of the papers. However, these survey papers neither collected articles systematically nor presented a bibliometric analysis to discuss the research trend, extract the most influential articles and countries, and present the collaboration pattern of this research field. In addition, many research papers did not categorize the articles according to their utilized image processing techniques and also did not analyze the articles (accurately) so that future researchers can have a clear vision of the research field of image-based crack detection techniques. As there is still a huge literature gap and research scope, in this work we are going to delineate the existing papers in this domain in a systematic manner; we will present a bibliometric analysis as well as a critical analysis of the works in an effort to lessen the difficulties for new researchers to understand the research trends, hot topics, and methodologies of image-based crack detection algorithms. Only mentioned the papers under their corresponding algorithms but did not analyze the papers one by one based on their features or characteristics.

Research Methodology
This work has been designed using a mixed method for presenting a bibliometric analysis and critical analysis of the papers which focus on the algorithms utilized for crack detection on concrete bridges, buildings, and roads. Figure 1 presents the overall methodology of this study. As seen in Figure 1, the first stage is about the data collection for this systematic review. The second stage is related to the bibliometric analysis for identifying the key research areas. Stage 3 presents the critical review of the papers based on the abstract, methodology, and results for giving a brief overview of the development of the algorithms utilized for crack detection. For conducting the literature review in a systematic manner, we have followed a set of guidelines to include the most relevant articles. The overall process of the literature retrieval and data filtering technique (the first stage of Figure 1) can be visualized in Figure 2. The data collection (literature retrieval and data filtering) process was divided into a total of four phases based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) method: identification, screening, eligibility, and inclusion.
• Phase 1: The authors searched for the papers in four different online digital libraries in November 2020 including Web of Science (WOS), Sciencedirect, IEEE Xplore, and Willey online library using the search string "crack detection" AND ("bridge" OR "road" OR "concrete") AND ("vision" OR "image"). In this way, the authors were able to download 642 papers initially. However, they limited the search string to a time span of ten years (2010 to 2020) for discussing the latest technologies. After removing the duplicate records, the authors identified a total of 395 papers at the end of Phase 1.
• Phase 2: In this stage, the authors screened 285 papers among the 395 papers extracted in Phase 1 by title and abstract which were published in the peer-reviewed journals. To avoid the inclusion of irrelevant articles in a systematic fashion, the authors developed some exclusion criteria and discarded the papers if (a) the research focus of any particular article is on non-image-based crack detection algorithms, (b) the paper discusses crack detection on reinforced plastic, beam, or steel structures, (c) the article is a review article instead of an original study. By employing the exclusion criteria, a total of 30 articles were excluded in this phase. • Phase 3: In this stage, the remaining 255 papers were assessed by investigating the full text of the articles. The authors excluded an article from the systematic review if the article (a) was not closely related to the research focus of this study, (b) did not have a novel as well as efficient contribution to the research domain of image-based crack detection algorithms, (c) did not provide detailed information about the design or the implementation of the proposed idea. As a consequence, 21 papers were excluded. So, the number of extracted papers becomes 234. After that, an additional 105 papers were also excluded from these 234 papers for bibliometric analysis as they were not supported by the data mining software utilized in this work. • Phase 4: After completing all the previous phases, 129 papers were finally included in this systematic review for scientometric analysis, and of the 234 papers, 65 DL-based papers were selected for critical analysis.

Bibliometric Analysis
The scientometric analysis is a technique to assess the academic quality of publications, sources, and authors and determine the research trends of a particular research topic by several statistical methods, such as publication rate, citation rate, collaboration pattern, keyword occurrences, etc. This work utilized two prominent visualization tools, VOSviewer [33] and CiteSpace [34], to provide a bibliometric analysis of the papers collected from the databases chosen in this work. In the following section, this work will extract the most productive publications, authors, and publication sources in the research field of image-based crack detection algorithms. In addition, this work will also present some scientific mapping analysis. This work considered co-citation analysis, co-authorship analysis, and occurrences of the keyword and timeline view analysis as the subsections of scientific mapping analysis. Co-citation analysis can elicit the relatedness and measure the proximity degree of the sources and authors. Co-authorship analysis can determine the collaboration pattern among the countries and institutions. Again, keyword occurrences can extract the research trends and important terms of a particular research topic.  29, which is about 22.48% of the total published papers (by that time). In 2020, the publication rate also follows an upward trajectory. As a result, until November 2020, forty-two (42) papers were published, which clearly indicates that by that time, the researchers started devoting their efforts more towards this research area, and from the analysis, it can be said that this research field would then undergo a huge increase in publication rate in the upcoming days. As in our analysis, we modeled the first crucial decade of this research trend, i.e., up to 2020 would be our range; however, checking the most recent works, we also find a similar pattern in 2021 (so far at the time of writing this article, even in 2022); the increase in the number of published papers is continuing.  This work has also analyzed the citation rate of publications per year. Figure 3b illustrates the distribution of citations achieved by the publications each year. The 129 papers were cited 3112 times during that period of ten years. The figure shows that the citation rate follows a continuously increasing trend with the passage of time. If this work divides the time span from 2010 to 2020 into three phases, then it can be seen from the figure that in the first phase (2010-2013), the number of citations in each year was less than 50 and the total number of citations was 51, which covers only 1.54% of the total citations. In the next phase (2014-2017), the distribution of citations also follows an increasing trend. The highest number of received citations in this phase is 254 in 2017, followed by 78 and 84 citations in 2015 and 2016, respectively. This phase consists of 18.54% of total received citations.
In the last phase (2018-2020), the number of citations increases significantly. In 2018, the publications were cited 462 times, and this number jumps to 838 in 2019. This is the highest increment of received citations by the publications between two particular years. Finally, in 2020, the publications received 1184 citations, which is the topmost among all the years in the decade. The last phase covers about 75% of the total citations, which implies that in recent years, impactful contributions are being made to this chosen research field.
This work also uses an author-level metric named the H-index, which ensures productivity and citation impact, to conduct the annual analysis of the publications. Figure 3c depicts the H-index distribution of the papers over the years. From the figure, it can be seen that the topmost H-index is 10 in 2019. The years 2018 and 2017 hold the second and third positions with H-index 9 and 8, respectively. It can also be seen that the H-index fluctuates over the years over the whole decade. The total H-index for 129 publications is 23, which means that among the 129 publications, only 23 publications have at least 23 citations. Furthermore, as this H-index is greater in the later part of the decade than in the earlier part, it can be inferred that the number of influential and productive papers is increasing in recent years.

The Most Cited Publications
In this work, we have found and analyzed the most influential and popular articles among the 129 articles based on the received citations by the publications. As a consequence, we set a threshold of a minimum of 50 citations and were able to extract 15 papers. These 15 papers were cited 2090 times, which is about 67.16% of the total citations received by all of the publications. As the lion's share of the citations comes from these papers, the productivity and influence of these papers in the research domain of image-based crack detection algorithms are evident.
These top-cited papers are summarized in Table 2 by their title, publication year, publication source, corresponding author's name, corresponding author's country, received citations, and average citations per year. This table is ordered based on the number of citations received by the publications. The highest citation (574) received by any single article is for the paper entitled "Deep Learning-based Crack Damage Detection Using Convolutional Neural Networks", which was published in 2017 in "Computer-Aided Civil and Infrastructure Engineering". This paper is so influential and popular among researchers that it received 574 citations within only 4 years, with a citation rate of 134.50 per year. The second highest on the list, "Crack Tree: Automatic Crack Detection From Pavement Images", received 242 citations. This paper was published in 2012 and its citation rate per year is 26.89.
A deeper analysis reveals that (see Table 2) most of the papers are receiving citations over the years in a linear manner, but [35][36][37] are receiving citations at an increasing rate. Though [36,37] were published in 2019 and 2018, respectively, they received 80 citations, each with a citation rate of 40 and 26.67 per year, which clearly indicates that along with [35], these papers are going to contribute significantly in the research field. It also indicates that these papers are highly influential and receiving attention from the researchers within a relatively short time. In addition, refs. [38][39][40] have also maintained a good citation rate over the years. On the other hand, refs. [41,42] have the least citations (56 and 52), and their low citation rates (5.60 and 7.43 citations per year) indicate that these papers are not receiving enough attention from researchers. [35] Computer In this subsection, this work describes the most productive publication sources in the field of image-based crack detection algorithms. The 129 collected papers were published in 64 different journals. However, this work extracted the top 10 publication sources based on the number of publications. These 10 journals published 68 (52.71%) articles in total among the 129 papers. The other 54 journals are responsible for the other 61 (47.29%) papers. Table 3 summarizes these most productive journals by their name, total publications, total citations, average citations per year, Impact Factor, 5-year Impact Factor, and H-index. The table is ordered based on the number of publications. From Table 3, it can be seen that the journal "Computer-aided Civil Infrastructure Engineering" holds the first position with a total of 11 publications and 1100 citations. The IF (8.552) of this journal is also quite high. This journal has an H-index of 8, which clearly indicates the popularity of this journal among researchers. The "Sensors" of "MDPI" is in second place because of its 10 publications and 193 citations. The top 5 journals on this list have the higher number of citations. However, it is quite strange for the other journals. For example, IEEE Access has published seven papers until now but received only 15 citations. Interestingly, the number of published articles in other journals would fluctuate from three to five, with a low citation number, which is a clear indication that the researchers are not paying attention to the journals at the bottom part of Table 3 for the papers related to the chosen research field of this work. However, it is notable that the journal "Machine Vision And Applications", which has an IF of only 1.605, published just 3 articles, but the papers were cited (already) 277 times. The number of received citations of this journal implies that the published papers in this journal are playing a significant role in the research field.
To understand the trend of citations and impact, we were interested in those journals which published the least number of papers but received a higher citation, so we searched in the dataset to check for the existence of these journals. We were able to find a few journals, such as "IEEE Transactions on Intelligent Transportation System" (2 papers, 257 citations), "Pattern Recognition Letters" (1 paper, 242 citations), and "IEEE Transaction on Automation Science and Engineering" (1 paper, 118 citations).
To understand the historical development of the top publication sources in terms of publications and citations, we have summarized the information in Table 4. From Table 4 it can be seen that all of the journals started publishing image-based crack detection algorithms-related papers regularly in around 2018. Before that period, i.e., 2010-2017, these journals published merely three to four papers per year except in 2013. In fact, in 2013, these journals did not publish a single paper in this research field (in accordance with our set criteria). In the case of citations, the understanding is that only three journals, "Computer-aided Civil Infrastructure Engineering", "Sensors", and "Machine Vision and Applications", are receiving citations in all of the years, and the rest are receiving citations from 2015 onwards. Among the journals, "IEEE Access" and "Applied Sciences Basel" received citations only in 2019 and 2020. It is notable that the journals are following an upward trend in the case of receiving citations over the years and, as a result, all of the journals received the maximum number of citations in 2020.

The Most Productive Authors
In this subsection, this work will discuss the most productive authors in the research area of image-based crack detection algorithms. From the dataset collected from WoS, we found that 425 authors are responsible for 129 papers. We have extracted the top ten authors according to the number of publications (five authors) and the received citations (five authors). Table 5 summarizes the most productive authors by their name, number of total publications, total citations, average citations per year, number of published papers as first author, H-index, and country. The first part of Table 5 presents the top five authors who have the highest number of publications. From the table, it can be seen that the highest number of publications by any single author is five. "Ying Chen" and "Zhong Qu" both have published five papers, but "Ying Chen" has received more citations than "Zhong Qu". However, the interesting thing is that "Zhong Qu" has published all of his five papers as the first author, while no other author has published more than one paper as the first author. "Weigang Zou" and "Wei Li" both have published four papers and received eighteen and nine citations, respectively. "Qingquan Li" holds the fifth position on this list with three publications. However, with only 3 publications, "Qingquan Li" received 388 citations, which is a clear indication of both productivity and the high influence of this author. One more noticeable thing from the first part of this table is that all of the productive authors in terms of the number of publications are from China, which implies the significance of Chinese researchers in the chosen research field (of this work).
The second part of this table presents the top five most cited authors. It is visible from the table that the first three authors have the same values for all of the measurement parameters used in this table. They are the authors of the highest cited paper of the dataset entitled "Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks". They published only one paper in the chosen research field and received 575 citations, with an average of 143.75 citations per year, and proved their excellence in this field. "Qingquan Li" is the only author among the top ten authors who places in both the first part and the second part of Table 5. Finally, "Mao Qin Zou" holds 5th place, with 2 publications and 298 citations. Among the top five authors extracted based on received citations, two authors are from China and the other three are from Canada. However, "Young-zin Cha" is also a Chinese researcher who was working at the University of Manitoba, Canada during the publication of his paper.

The Most Productive Countries
Let us now see the most productive countries in the research domain of imagebased crack detection algorithms. From the dataset collected from WoS, it is found that 29 countries are responsible for 129 papers. Figure 4 presents the geographical distribution of the papers around the world. In this figure, crimson-colored countries published more papers than lavender-colored ones. If we analyze the contributions of the continents in the case of publishing papers, then it can be observed that Asia is the supreme continent, publishing 58.43% of all the publications. America, Europe, and Australia are responsible for 20.48%, 15%, and 3.61% of the published papers, respectively, while the remaining 2.48% comes from the other parts of the globe. After presenting the geographical distribution of the publications, our work extracted the top 10 countries based on the number of publications. Table 6 summarizes the most productive countries by their names, total publications, total citations, average citations per year, the number of cited papers greater than or equal to 30/20/10/5, and the H-index. This table is ordered based on the number of published papers by each country. From Table 6, it can be seen that China is the leading country, with 54 publications and 722 citations. The H-index (14) of China is also the highest among these countries, along with the USA. The next country on the list is the USA. The number of published papers (29) from the USA is less than China, but the total number of citations (1450) is not only greater than China, but also the highest among all the countries. Again, the USA has the highest number of publications (8) and received greater than or equal to 30 citations. The USA claims the highest rate of citations (161.11) per year. South Korea holds the 3rd position on the list, with 13 articles and 104 citations. Following South Korea, Japan published 11 articles with 551 citations. However, the citation rate of South Korea (52) is quite close to Japan (55.10), though there is a huge difference between their citations.
With a deeper understanding, if we check Table 6, then it can be found that despite being number 5 on the list with 7 publications, Germany has received only 21 citations, while Canada and Spain received 584 and 252 citations, respectively, for only 5 publications, which clearly denotes the influence of the papers published from these countries. England, Australia, and Vietnam published 5, 4, and 3 papers, respectively, and received 21, 16, and 90 citations. However, there are also a few other countries in the dataset that are not present in the top 10 countries, but they received greater citations with the least number of publications. This fact indicates the significance of the papers published in those countries, e.g., Portugal (1 publication, 139 citations) and India (2 publications, 98 citations).

Co-Citation Analysis
This work considered co-citation analysis as one of the techniques for science mapping analysis. Co-citation occurs when a pair of published articles, say, x and y, are cited together in any other published document z. In this section, this work will present a co-citation analysis by using cited sources and cited authors as the unit of analysis to show the relatedness between the journals and authors in terms of the research focus. If two sources or two authors are cited together, it goes without saying that they have a common research interest. This work set a threshold of at least 30 citations and found 20 sources that satisfied the threshold. Table 7 presents these sources with the total link strength of co-citation. This table is ordered by the total link strength of the journals. The total link strength of a source refers to the sum of link strengths between that source and all other sources, whereas link strength is the frequency of co-citation between the two sources in a third source.
For a better analysis, this work generated the scientific landscape of co-citation network of the journals using the VOSviewer software ( Figure 5).  From Figure 5, it can be seen that the journals have been divided into a total of 3 clusters (red, green, blue) with 190 links and 13,484 link strength in total. Each node in Figure 5 represents the corresponding journal. The bigger the node, the higher the weight. For the co-citation analysis, total link strength has been selected as the weight in this work; that means the sources which have higher weight have higher link strength. The connection line between two consecutive journals illustrates that these two sources have been cited together in a publication. The thicker the line, the more frequently they have been cited together. The interesting issue about the clusters is that all the sources of each cluster have been cited along with the sources of all other clusters.
With a deeper perspective, it can be noticed that the red cluster contains a total of 10 sources (50%). The prominent journal in the red cluster as well as all the clusters is "Computer-aided Civil Infrastructure and Engineering", with 19 links and a total link strength of 5201. The journal has been cited with all the other journals; however, it has been cited the most times (819) along with the journal "Automation in Construction", which clearly indicates the high relatedness of these two journals in this specific research field. "Automation in Construction" achieves the second position, with a total link strength of 3053 in the red cluster. The other influential sources in this cluster are "Journal of Computer in Civil Engineering" (2943), "Construction and Building Materials" (1010), "Machine Vision and Applications" (933).
The second cluster (green) consists of seven sources. The most influential source in this cluster is "Proceeding CVPR IEEE", with a total link strength of 1905. This source has been cited the most times (476) with the journal "Computer-aided Civil Infrastructure and Engineering". The other influential journals of this cluster are "IEEE Transactions on Pattern Analysis and Machine Intelligence" (1541) and "IEEE Transaction on Intelligent Transportation System" (1438). The interesting point about the green cluster is that the sources of this cluster have been cited more times with a few sources of the red cluster ("Computer-aided Civil Infrastructure and Engineering", "Automation in Construction", "Journal of Computer in Civil Engineering") than the sources of its own cluster, which undoubtedly indicates the high influence of those three sources of the red cluster in the case of co-citation. Finally, the blue cluster consists of only three sources. The highly influential source of this cluster is "Advanced Engineering Information", with total link strength of 881. The other two sources of this cluster are situated closely in Figure 5, but "Advanced Engineering Information" is situated far away from the other two journals of the blue cluster. The sources of the blue cluster have been cited more times with the highest sources of the red and green clusters than the sources of the blue cluster. It implies that though the sources of the blue cluster have relatedness based on their research topic, over the years, these sources have not been cited too many times together.
After analyzing the co-citation network of cited sources, this work analyzes the cocitation network of cited authors. As a consequence, this work set a threshold of at least 30 citations, and among 2361 cited authors, only 14 authors met the threshold. Table 8 presents these cited authors with the total link strength of co-citation. This table is ordered by the total link strength of the authors. For better understanding, we have generated the scientific landscape of the co-citation network of the cited authors using VOSviewer software, as shown in Figure 6.
From Figure 6, it can be seen that the authors have been divided into a total of 2 clusters (red, green) with 91 links and a total link strength of 1973. Like the cited sources in the case of cited authors, total link strength has been selected as the weights of the nodes. If there is a connection line between two authors, then it indicates that the authors have been cited together in any other publications. The thicker the line, the more frequently the authors have been cited together. It can be seen that the red cluster contains a total of nine authors. The leading author in this cluster is "Young-jin Cha", with a link strength of 527. The author has been cited the most times (59) with "Qinayun Zhang", which clearly indicates that their publication focus is on a similar type of research topic. Young-Jin Cha is also cited many times along with other authors, such as "Fu-Chen Chen" (50), "Yann LeCun" (49), and "Tomoyuki Yamaguchi" (49). The other influential authors according to total link strength in this cluster are "Qinayun Zhang" (372), "Qiang Zou" (255), and "Fu-Chen Chen" (250). The cited authors of the red cluster have higher citation linkage with the authors of the red cluster rather than the authors of the green cluster. The green cluster consists of five authors. The most prominent cited author of this cluster is "Tomoyuki Yamaguchi", with 387 link strength. This author has the maximum link strength with "Yu Fujita" (64), and he has also a higher link strength with "Mohammad R. Jahanshahi" (56). The other influential authors of this cluster include "Mohammad R. Jahanshahi" (314) and "Christian Koch" (255). Like the red cluster, the cited authors of the green cluster have higher citation linkage with the authors of the green cluster rather than the authors of the red cluster. So, this thing is different from the co-citation network of cited sources. In case of cited sources, few sources are so influential that they have a higher citation linkage with the sources of all the clusters.

Co-Authorship Analysis
Collaboration in research works is very important to produce creative ideas and implement them in an easier and smarter way, as one individual can find it too difficult to complete a research task. Co-authorship analysis is another technique that has been used in this work as a bibliometric measurement. In this section, this work will present a coauthorship analysis by using countries and institutions as the units of analysis to show the collaboration pattern among the authors of different countries and institutions. In the case of the co-authorship analysis of the countries, this work set a threshold of a minimum of 5 documents per country and found 8 countries among the 29 countries which satisfied the threshold. Table 9 presents the countries by total link strength. This table is ordered by the total link strength of the countries. For a better analysis, we have generated the scientific landscape of the co-authorship network of the countries using the VOSviewer software ( Figure 7).  Figure 7, it can be seen that the countries have been divided into a total of three clusters (red, green, blue). Each node in Figure 7 represents a country. The bigger the node, the more the country has collaborated with other countries. The connection line between the two countries reveals the presence of collaboration between the countries. The thicker the line, the more frequently the countries have collaborated. It is clear here that the red cluster is the prominent one among the clusters; it has four countries in total. Among the countries in the red cluster, "China" is the leading one, with a total link strength of 18. China has collaborated with all the other countries in Figure 7, which clearly indicates the productivity and significance of China. However, China has collaborated the most times (7) with the USA. The next influential country in the red cluster is the "USA", with a total link strength of 9. However, the USA has collaborated with only three countries (China, South Korea, Canada). South Korea and Canada have collaborated with only China and the USA. In the green cluster, there are only three countries. In this cluster, all the countries are from Europe (Germany, Spain, England). These countries are collaborating with each other along with China, which implies that European researchers in this particular area generally collaborate with other European researchers. Finally, Japan is the only country in the blue cluster. Though Japan is an Asian country, it is not in the same cluster as China. However, Japan is collaborating with only China, with a link strength of 1.

Co-Occurrence and Timeline View Analysis
Keywords of a research paper are very important tools for understanding the research topic of an article. The keywords are said to co-occur when they are present in a single article. In this subsection, this work will present a co-occurrence analysis of the keywords to map the research trends and highlight the research hotspots in the field of image-based crack detection algorithms. From the total 129 publications, we have obtained 519 keywords altogether using VOSviewer. Among the 519 keywords, 30 keywords satisfied the threshold that we set as the least number of co-occurrences of a keyword (value 5). Figure 8 presents the network visualization of the publications' keyword co-occurrences. The keywords are presented by the nodes or circles in Figure 8. The size of a node reveals the weight or the number of occurrences of a keyword. The bigger nodes represent the most weighted or frequently occurring keywords. On the other hand, if any circle or node is small, then it means that the keyword has not occurred so many times in the publications. According to the terminology, it can be noticed from Figure 8 that "crack detection" is the keyword with the highest number of occurrences. A few other keywords with a higher number of occurrences include "deep learning" (25), "damage detection" (21), and "system" (17). The connection line between the nodes is also important information. If there is a line between two nodes, then it implies that these keywords appeared together. The thickness of these lines reveals the link strength; in other words, it indicates the number of co-occurrences between the keywords. The thicker the line is, the more co-occurrences the keywords have. From Figure 8, it can be noticed that the keyword "crack detection" has the highest link strength (121) among the keywords. The node "crack detection" has a higher link strength or a thicker line with "deep learning" (11), "image processing" (9), "concrete" (9), and "damage detection" (9). The relationship of "deep learning", "image processing", "concrete", and "damage detection" with "crack detection" implies the close integration between the keywords. It is a clear indication that during those crucial 10 years, deep learning-based image processing was highly utilized for detecting cracks and damage on reinforced concrete structures.
Another important thing in Figure 8 to notice is the distance among the nodes. The distance among the nodes represents the semantic similarity of the keywords. The keywords which have stronger similarity are situated within a shorter distance. In contrast, a longer distance denotes a lower similarity between the nodes. VOSviewer divides the keywords of a dataset into several clusters or sets of keywords based on their similarity. From the same figure, it can be observed that the keywords are divided into a total of three clusters denoted by three different colors (red, blue, and green). Table 10 summarizes the clusters. From Figure 8, it can be noticed that the red cluster is the prominent one among the clusters containing 12 keywords. The most frequent keyword in the red cluster is "damage detection" (21), which has a total of 25 links; this means that it co-occurred with 25 different keywords in the articles. The other notable keywords in this cluster include "system" (17), "algorithm" (16), "inspection" (14), and "model" (12), which highlight the technical and mathematical aspects in the case of damage or crack detection. The green cluster's core is on "deep learning" (25), with a close linkage with other keywords such as "image processing" (18) and "computer vision" (9), highlighting the importance of deep learning-based IPTs for crack detection. The blue cluster connects "crack detection" (45), which is the most frequent keyword in the publications, with "concrete" (12), "segmentation" (11), and "convolutional neural network" (11), highlighting the crack detection and segmentation on reinforced concrete structures.
After analyzing the clusters, we have extracted the top 10 keywords in the publications. Table 11 summarizes the top 10 keywords with their frequencies, links, and link strength. This table is ordered by the frequency of the keywords. From Table 11, it can be observed that among the keywords, "crack detection", "damage detection", "system", "inspection", and "identification" are connected with all other keywords on the list, so each of them has nine links, which clearly indicates that these terms are closely integrated and indivisibly connected, as well as that they are the core keywords in this research area. One more noticeable thing from this table is that the keywords which have higher frequencies may not have higher link strength in all cases. For example, "image processing" is at number 4 on the list based on the frequency (18) and has 7 links, but its link strength is only 19, which indicates that it has not co-occurred so many times with other keywords. For this work, we have also made a timeline view of the keywords (which met the threshold) using the CiteSpace software to present the development trend of the important topics during 2010-2020 ( Figure 9). From Figure 9, it can be noticed that there are a total of four stages in terms of time. In the first stage, from 2010 to 2013, the prominent keywords were "neural network", "crack detection", "image processing", and "computer vision". The research on crack detection in this stage was dependent on Neural Networks and image processing. In the later stage, from 2013 to 2016, the research on crack detection began to increase and especially focused on the mathematical models and technical aspects. For this reason, the prominent keywords of this period were "algorithm", "system", "model", and "inspection". The research on crack detection had a revolution during the third stage (2016-2019). The researchers started utilizing deep learning-and convolutional neural network-based techniques to detect cracks in reinforced concrete structures. The notable keywords of this period were "deep learning", "convolutional neural network", "pavement crack detection", and "bridge inspection". Finally, in the last stage from 2019 to 2020, the number of keywords is too low. In this stage, the specific focus was on "structural health monitoring" and "3d asphalts surfaces". As per our observation, this happened due to the emergence of state-of-the-art technologies already in the third stage (2016-2019). These technologies were also employed in 2019-2020, and no newer technology was developed for detecting cracks during that time period. Table 12 lists the keywords of crack detectionrelated publications that occurred during the four different periods.

Critical Analysis
After analyzing the previous survey papers and performing bibliometric analysis based on keywords, we have found out that in recent years, DL methods are more viable and have received much more attention from researchers. As a consequence, we decided to present a critical analysis of the papers based on DL techniques (especially) for elucidating and acquiring knowledge on DL methods used for crack detection. Therefore, after omitting the articles based on traditional techniques by using the methodology described in Section 3, this work ended up with 65 papers (Figure 2). We have grouped these 65 papers based on the type of computer vision technique used in them, i.e., classification, object detection, and segmentation. Then, we analyzed the 65 papers based on their problem statements, methodologies, and results. After summarizing the papers, we have brought forward some questions as follows: The answers to these are summarized in Tables 13-15 for the papers of different categories.

Classification
In [49], Tran et al. presented a two-step sequential Mask region-based Convolutional Neural Network (Mask-RCNN) model to classify pavement crack type and severity level of the cracks. The authors trained, validated, and tested their model using 32,563 images which were collected by a CMOS vision sensor mounted on a road screening vehicle. After completing the training process, the model was able to classify three types of cracks (i.e., longitudinal, transverse, fatigue) as well as the severity level of the cracks (i.e., low, medium, high). Tran et al. mentioned that the model was 92.10% accurate and showed 96.32% and 94.67% average precision and recall, respectively. The authors compared their methods with a few other classification techniques and showed that their model was more accurate than others, and they also claimed that they performed crack classification problems with the highest (nine) number of classes. Moreover, the authors measured the widths of the cracks; though the predicted width was slightly different from the original width that they considered, this error, however, is acceptable.
In [50], Wang et al. proposed a new framework for detecting cracks by fine-tuning the AlexNet architecture. The authors considered the class imbalance problem and the presence of disturbance on non-crack images and solved the issues by developing an active learning method. The proposed framework used a sliding window approach to filter out the images and divided one image into four training images, which increased the number of training images and facilitated the classification task. Wang et al. trained their model and obtained 97.55% accuracy. In addition, the authors compared their model with ChaNet and showed that their method outperformed ChaNet in terms of all evaluation metrics.
In [51], Zhang et al. proposed a hybrid method based on IoT (Internet of Things) technology and a CNN model for classifying cracks as well as monitoring the structural health condition of concrete bridges in real time. In this work, Zhang et al. first preprocessed the crack images by converting the images into grayscale, increasing the contrast using a piecewise linear function, and denoising the images using wavelet transformation. Then, the authors developed a CNN model to classify different types of cracks (i.e., small, large, serious cracks) for measuring the severity level of the cracks. Zhang et al. trained their model with 300 images and obtained an accuracy of more than 90%.
In [52], Dung et al. detected cracks on the joints of bridges using three deep learning methods. Firstly, the authors developed an SCNN model from scratch to classify crack images. Secondly, the authors utilized a pre-trained VGG-16 model and finally fine-tuned the top layers of the VGG-16 model for detecting cracks. The authors trained the model by using the images collected from a previous fatigue crack inspection at Tokyo University and demonstrated that the third method outperformed the previous two methods in terms of accuracy (97%), while the other two methods produced 90% and 94% accuracy, respectively. They also mentioned that data augmentation helped to increase the accuracy at a rate of 5%, 2%, and 5% of the models, respectively.
In In [37], Dorafshan et al. compared the performance of a few conventional edge detectors (i.e., Roberts, Prewitt, Sobel, etc.) methods with Deep Convolutional Neural Networks (DCNNs) for classifying cracks in concrete images. The authors considered AlexNet as the DCNN model and trained it in three different modes (i.e., Fine-tuned, Transfer learning, and classifier). Dorafshan et al. mentioned that the AlexNet models achieved 97-98% classification accuracy, whereas the traditional methods managed to achieve 53-79% accuracy. They also showed that AlexNet with a transfer learning scheme obtained the highest accuracy.
In [54], Gopalkrishnan et al. deployed a DCNN framework based on a truncated VGG-16 model for classifying asphalt and Portland cement concrete images as "crack" or "non-crack". The authors utilized a subset of images from the dataset of FHWA and LPTT programs. Then, they extracted features using the pre-trained VGG- 16   '-' denotes the paper did not provide the particular information.

Detection
In In [66], Co et al. presented a model based on Alex-Net for detecting cracks on concrete surfaces. They collected images from concrete surfaces and categorized them into a total of five classes including cracks, plants, intact surfaces, two crack-like patterns. They trained the model using transfer learning and fine-tuned the Alex-Net. After classifying the images, Cho et al. utilized a probability map in the third stage and detected the cracks with bounding boxes. Their model produced accuracy, precision, and recall of 98%, 86.73% and 88.68% respectively. They utilized a drone to capture real images and performed an on-site experiment. The authors demonstrated that their model successfully detected cracks except for only one thin crack.
In [67],  '-' denotes the paper did not provide the particular information.

Segmentation
In [69], Li et al. presented a convolutional encoder-decoder network (CedNet) for detecting the cracks at the pixel level by using the DenseNet-121 architecture as the encoder part of the proposed CedNet. In this work, the authors built a dataset for crack detection including 1800 images and trained the model by utilizing their own dataset. After successfully detecting the cracks with 98.90% accuracy, the authors performed perspective transformation to correctly construct the distorted predicted images. They also measured the width of the cracks and determined the orientation of the cracks by employing the Euclidean distance transformation and least square principle, respectively. Li et al. compared their model's performance with Mask-RCNN as well as FCN and showed that their model was able to detect cracks more accurately than those two models even if the crack was thin.
In [70], Huyan et al. proposed an encoder-decoder-based architecture named CrackU-Net by improving the "U"-shaped model U-Net for detecting cracks on pavement images. the authors deployed a 3D data collection system for building the dataset, which consisted of 3000 images. In this work, Huyan et al. took the problem of false positive crack detection into consideration and successfully improved it using their model. The proposed model produced 99.01% accuracy and it outperformed some well-known traditional methods (e.g., Sobel, Roberts, LG) as well as FCN and U-Net for the pixel-level segmentation of pavement crack images.
In [71], Chen et al. exploited the rotation-invariant property of the cracks for the first time and as a consequence, they integrated active rotating filters (ARFs) with an FCN model named DeepCrack and presented a new model called ARF-crack for detecting cracks at the pixel level. The authors assessed their model on four different benchmark datasets including DeepCrack, CFD, Crack500, and GAPS384. The authors presented visually that their model was able to segment cracks accurately for all the datasets and they claimed through some numerical results (e.g., average precision, recall) that their model outperformed the DeepCrack, FPHBN, and IRA-Crack models. They also mentioned that the proposed model needs a fewer number of parameters and less time to be trained.
In [72], Pan et al. developed a deep learning model named spatial-channel hierarchical network (SCHNet) by employing the VGG-19 model as the baseline for segmenting cracks in reinforced concrete structures. The authors integrated a self-attention mechanism with their proposed model by running three different modules (feature pyramid attention module, spatial attention module, and channel attention module) to establish a relationship between pixels and improve the reliability of crack segmentation. Pan et al. trained their model with the SDNET2018 dataset and selected Mean IOU as the evaluation metrics of their task. They mentioned that usage of each attention module increased the model's IoU (Intersection over Union) gradually and it finally ended up at 85.31%. The authors compared their model with a few other state-of-the-art methods and ensured that their model was the superior one. They also tested their model under various conditions (i.e., holes, shadow on the surfaces, rough surface) and each and every time, their model successfully segmented the cracks.
In [73], Kalfarisi et al. presented two deep learning methods; one is FRCNN-FED, which is a combination of faster region-based convolutional neural network (FRCNN) and structured random forest edge detection (SRFED) methods, and the other is Mask-RCNN. The authors attempted to detect cracks using bounding boxes and segment the cracks simultaneously. Kalfarisi et al. trained their model with some images which were collected during some real-life structure inspection and also evaluated their technique's performance by detecting the cracks from the images of roads, bridges, buildings, and so on. They claimed that their model was able to detect cracks as well as measure the length and width of cracks successfully.
In [74], Lee et al. presented a semi-supervised learning method for detecting cracks in concrete structures. To reduce the cost of acquiring a vast quantity of data for supervised learning, Lee et al. developed an adversarial network to produce labeled confidence maps from unlabelled images. After that, the authors applied a multiscale segmentation learning network instead of encoding decoder architecture to segment crack images efficiently. The authors trained their method using METU and USU datasets and achieved 98.176% accuracy. To show the robustness of the method, Lee et al. compared their technique with a few other encoder-decoder-based models, and their method outperformed all of the compared methods in terms of all evaluated metrics.
In [75], Fan et al. modified the U-Net architecture by embedding a Hierarchical feature learning (HF) module and a multi-dilation module (MDM) and proposed a novel framework named U-Hierarchical Dilated network (U-HDN) for detecting cracks on asphalt pavements at the pixel level. The authors employed MDM with different dilation rates for extracting crack features of different context sizes and HF for predicting the feature maps on different scales and fused them to obtain an accurate segmented image of pavement cracks. Li et al. trained their model on both CFD and AgileRN datasets and their model showed better precision and recall values than all the compared methods.
In [76], Gil et al. developed a novel method named ConnCrack by combining a conditional Wasserstein generative adversarial network (cWGAN) and connectivity maps for crack detection on pavement images. In this work, the authors also published a dataset named EdmCrack600 containing 600 images and trained their model with both EdmCrack600 and CFD datasets. The authors evaluated their model in terms of precision, recall, and F1-score. They compared their model's performance with a few conventional methods (i.e., Canny, CrackTree, CrackForest) and deep learning methods (ResNet 152-FCN, VGG19-FCN, Cracknet-V). Gil et al. demonstrated that their model outperformed other methods by means of all parameters; however, they noticed that the model performed better using the CFD dataset than the EdmCrack600 dataset.
In [77], Alipour et al. presented a fully convolutional neural network named Crackpix based on the VGG-16 architecture in order to perform semantic segmentation of cracks on concrete structures. The authors employed five FCN architectures (i.e., FCN32s, FCN16s, FCN8s, FCN4s, FCN2s) and trained their model using images collected from several bridges, roads, and building surfaces. The method was able to segment cracks successfully with 92.17% validation accuracy, and the authors claimed that it was the first FCN model which could segment images of arbitrary sizes.
In [78], Ji et al. utilized DeepLabV3+ for segmenting cracks on pavement images. The authors also deployed a crack quantification algorithm named the fast parallel training (FPT) algorithm for calculating the length, width, area, and ratio of the cracks. Ji et al. trained their model using a dataset of 300 images and the model successfully segmented several types of cracks (i.e., single crack, multiple crack, intersecting crack, alligator crack). They evaluated their model by means of the MIoU metric, which was calculated as 0.7331. Ji et al. compared their model with few other state-of-the-art deep learning models (i.e., FCN, DeepCrack, Encoder-Decoder). The authors demonstrated that their model could predict unseen crack images better than all the compared methods.
In [79], Wei et al. designed an algorithm based on GAN (Generative Adversarial Network) and neural style transfer for detecting cracks on road images. The authors produced trained images from only one sample crack image using the GAN simulator. Then, Wei et al. utilized a segmentation algorithm named Seg which produced an F1-score of 0.82 and successfully predicted the cracks at the pixel level.
In [80], Lau et al. presented a U-Net model in which the encoder is a ResNet-34 architecture for segmenting cracks on pavement images. The authors trained their model on both CFD and crack500 datasets. After completing the training session, the authors demonstrated that their model produced F1-scores of 96% and 73% for the CFD and crack500 datasets, respectively, as well as predicted the pixels which would contain cracks on pavement images successfully. Lau et al. compared their method with a few other U-Net-based models and the FCN model and showed that their method performed better than all the compared methods according to precision, recall, and F1-score. Then, the authors performed several ablation techniques (i.e., training the model using frozen layers and not using the frozen layers, using the SCSE module and not using the SCSE module) to check the increase in the performance and robustness of the model. The authors demonstrated with numerical values that the usage of the ablation studies helped the model perform better.
In [81], Song et al. presented a deep learning model based on ResNet in which a multiscale dilated attention (MDA) module and feature fusion upsampling (FFU) modules are embedded to detect cracks at the pixel level on pavement images. The authors utilized the MDA module for extracting high-level features and the FFU for restoring the crack spatial resolution. Song et al. trained their model using the dataset named "CrackDataset" and produced higher precision, recall, and F1-score than a few other state-of-the-art deep learning methods, such as SegNet, U-Net, PSPNet, DL-V3+, and DFN. After detecting the cracks at the pixel level, Song et al. classified the types of cracks (i.e., Transversal, Longitudinal, Block, Alligator) and severity level of cracks (i.e., normal, medium, high) by identifying the branches and measuring the height as well as weight of the cracks. The authors demonstrated that their model obtained over 95% accuracy in terms of classifying the cracks.
In [36], In [83], Yao et al. presented a novel concept to reduce the computational complexity of the encoder-decoder-based architecture for detecting concrete cracks at the pixel level. They proposed a switching module named SWT consisting of a binary classification header that would classify crack and non-crack images and would pass only the positive samples to the decoder module while directly outputting the negative map, without passing the samples to the decoder module. Yao et al. integrated their switching concept on U-Net and the DeepCrack model by utilizing the datasets CrackTree 206 and as well as AIMCrack. The authors demonstrated that their method did not diminish the performance, and also reduced the computation time and computation complexities by both quantitative and qualitative analysis. At the end, they showed that U-Net and the DeepCrack model ran about 30.7% and 62.9% faster with SWT than without SWT.
In [84], Cai et al. developed an FCN model named pavement and bridge crack segmentation network (PCSN-512) by modifying the SegNet architecture for performing semantic segmentation on the crack images of pavement and bridge decks. The authors built a dataset of 5000 images and trained their model using the "Adadelta" crack images with perplexing backgrounds and also by comparing the method with a few other state-ofthe-art networks (i. e., FCN, MRCNN, PCSN). The authors demonstrated that the proposed PCSN-512 segmented the images successfully with 93% accuracy and outperformed the compared models in terms of inference time, precision, and recall.
In [85], Liu et al. utilized U-Net for the first time to detect cracks on concrete images. The authors collected a total of 84 images from Huazong University, China, and trained their model with the Adam optimizer. Liu et al. evaluated their model by means of three metrics (i.e., precision, recall, F1-score) and also compared their model with Cha's CNN [35] and an FCN model. The authors claimed and demonstrated by quantitative analysis that their model is better than all the compared methods (that they selected).
In [86], Qu et al. performed both classification and semantic segmentation on pavement crack images. The authors fine-tuned the LeNet-5 model for the classification task and modified the VGG-16 model by reducing some convolution layers, adding a 1 × 1-1 Conv layer after an Eltwise layer, using the horizontal expansion method for detecting cracks at the pixel level. Qu et al. also built two datasets named CCD1500 and CCD861 in this work. They trained their model with CCD861, CFD, DeepCrack, and Crack200 datasets and demonstrated that their model performed better than all the compared methods (i.e., VGG-16, U-Net, Percolation) for each dataset.
In [87], Fan et al. deployed an ensemble learning technique on a CNN to detect cracks in pavement images. The authors used only convolution layers and fully connected layers without any pooling layers in each individual CNN model, as the pooling layer loses important pixel information. The authors averaged the output of each CNN model and presented the predicted pavement images. Fan et al. trained their model on both CFD and AgileRN datasets; they conducted an experiment with the number of CNN models to be ensembled and finally selected three CNN models, as they obtained the highest resulta from three ensembled CNN models for both datasets. They also compared their model with a few other state-of-the-art methods and claimed that their model outperformed all the compared methods in terms of precision, recall, and F1-score.
In [88], Feng et al. presented a novel method based on the U-Net architecture for detecting cracks on road images. The authors added residual identity blocks on the U-Net and passed the extracted information of different layers to the final layer by adding the weighted values of the pixel so that no original information could be lost. The authors trained their model on the CFD dataset and demonstrated that it achieved precision, recall, F1-score, and dice coefficient of 94.29%, 99.36%, 96.76%, and 86.95%, respectively.
In [89], Shen et al. developed a deep learning framework named CrackSegNet for detecting cracks on concrete tunnels. The authors designed their model based on U-Net and by adding dilated convolution layers on the encoder stage and integrating a Spatial Pyramid pooling Module (SPP) at the end of the encoder stage. They trained their model on images collected from Zhejiang province, China. The authors experimented with their framework by using different forms (i.e., dilated convolution, skip connection, SPP) and demonstrated that their model performed better by using dilated convolution layers in terms of all evaluation metrics (IoU, precision, recall, F1-score).
In [90], Alipour et al. developed a deep learning framework based on the Res-18 architecture to detect cracks on multiple types of infrastructure. In this work, the authors attempted to develop a model which would be able to achieve good accuracy on any kind of surface (i.e., concrete surface, asphalt surface). Alipour et al. presented three schemes (i.e., joint training, sequential training, ensemble training) to endow their model with adaptivity. The authors trained their mode with two different datasets and showed that the joint training method obtained the highest accuracy (97.8%). They also demonstrated that their model outperformed two material-specific models (i.e., Cha et al. [35] and Eisenbach et al. [91]) in terms of accuracy.
In In [96], Li et al. presented a two-stage deep learning model for detecting cracks on concrete bridges. In the first state, the authors used smaller receptive fields (3 × 3) and smaller sizes of images (18 × 18) to produce the confidence map. In the second stage, the utilized model was the same but the input size was the output (64 × 64) of the first stage and the receptive field was also bigger (5 × 5). After producing the confidence map, they fused it with the previous one and finally obtained the predicted result. The authors used convolution layers and three densely connected layers in the DL model for extracting features. They collected 65 images from different bridges and trained the model using the images. Li et al. compared their model with STRUM and the Canny edge detector and demonstrated that their model outperformed the compared methods in terms of accuracy (99.55%) and precision (78.49%).
In [97], Liu et al. presented a deep learning framework named NB-FCN consisting of a VGG-16 architecture and a naive Bayes decision technique. The FCN model extracted essential features to recognize and segment crack images. In addition, the authors used a naive Bayes probability fusion scheme to again classify the crack images for reducing the false detection rate. The authors utilized a device named Bridge Substructure Detection (BSD-10) to collect images from different bridges and trained their model using the SGD algorithm. The special characteristic of this model is that it can detect cracks successfully with different kinds of complexities on the surface (handwriting, water stains, peel-off). Liu et al. compared their model with the CrackTree algorithm, Random Structured Forest algorithm, CNN and demonstrated that their model is superior in terms of accuracy and inference time.
In [98], Pan et al. detected cracks on the U-rib-to-deck welded joint area of bridges by proposing a deep learning algorithm based on VGG-Net. The authors also tested the performance of different models including ResNet, Deeplab, and PSPnet and demonstrated that their model works better in terms of precision and recall.
In [99], Wang et al. presented a DL named CrackNet for detecting cracks on 3D asphalt surfaces. The authors developed the CrackNet model with one input layer, two convolution layers, and two fully connected layers. However, the authors did not use any pooling layers; rather, they compared each pixel with its neighboring pixels to achieve pixel-level accuracy. Wang et al. trained their model with 1800 images and tested their model with 200 images. They observed that their model achieved 90.13%, 87.63%, and 88.86% for precision, recall, and F1-score, respectively. They also compared their model's performance with Pixel-SVM and 3D shadow modeling and showed that their model performed better than the other two methods.
In [100], Wang et al. proposed a recurrent neural network (RNN)-based model named CrackNet-R for segmenting cracks on pavement images. The authors employed a novel recurrent unit called gated recurrent multilayer perceptron (GRMLP) instead of LSTM and GRU for obtaining deeper abstraction, as it conducted multilayer transformation at each gating unit. The authors trained their model by using the images extracted from the PaveVision3D system. After a successful training session, the model achieved 93.06% segmentation accuracy. Wang et al. compared their model with CrackNet, CrackNet-LSTM, and CrackNet-GRU and showed that their model outperformed the compared methods in terms of accuracy, precision, and recall, and it was also four times faster.
In [ In [102], Zhang et al. presented a context-aware-based segmentation network for detecting cracks in concrete structures. First, the authors utilized the sliding window approach for localizing image patches, and then the authors employed SegNet to classify crack pixels from the image patches. Finally, Zhang et al. proposed and deployed a contextaware overlapping patch fusion (CAOPF) scheme for integrating the output of every patch to generate a final output map. The authors tested their model on three different datasets and achieved an F1-score of 82.34%, 82.52%, and 79.37%, respectively.
In [103], Xiang et al. presented an encoder-decoder architecture based on FCN integrated with a pyramid pooling module as well as an attention mechanism module for detecting cracks on pavement images. The authors used a pyramid pooling module for extracting global context information and an attention mechanism module for improving the representation ability of the encoder-decoder architecture. Furthermore, the authors employed dilated convolution layers for reducing the information loss due to pooling layers. Xiang et al. trained their model on three different datasets (i.e., Crack500, CrackTree200, CFD) and compared their model with CrackIT, CrackForest, FPHBN, and SegNet models. They demonstrated their model's superiority by visualizing predicted images and in terms of MPA and MIoU.
In [104], Zhang et al. presented a new model named CrackNet-V by modifying the original CrackNet architecture for detecting pavement cracks at the pixel level. CrackNet-V consists of three units (i.e., preprocessing layer, convolutional layer, output unit). Zhang et al. did not use any pooling layers, like the original CrackNet model, and developed a novel activation function named leaky rectified tanh function in their work. They trained their model using the images of the PaveVision3D system and obtained 84.3% precision, 90.12% recall, and 87.12% F1-score, which is better than the original CrackNet architecture.
In [105], Wu et al. proposed a sample and structure-guided network based on U-Net for segmenting cracks on road images. The authors introduced the structure-guided method to solve the problem of illumination variation and shadow in the case of detecting cracks. Wu  In [108], Song et al. designed a model named CrackSeg consisting of a multiscale dilated convolution module, upsampling module, and some convolution as well as pooling blocks for detecting cracks at the pixel level in the presence of complex backgrounds. The authors built a new dataset with a total of 8196 images in their work and trained the model. However, they tested their model with CFD and AgileRN datasets along with their own dataset and achieved mIoU, F1-score, recall, and precision of 73.53%, 97.92%, 97.85%, and 98.00%, respectively. They compared their model with other state-of-the-art models (i.e., CrackForest, SegNet, U-Net, Deeplabv3+, PSPNet, DeepCrack) and claimed with quantitative analysis that their model outperformed the compared methods in terms of all evaluated metrics.
In [109], Zu et al. developed a weakly supervised model based on autoencoders for detecting cracks on asphalt concrete bridge decks. The authors differentiated the data using the autoencoder and then extracted imported features by deploying a K-means clustering algorithm. After that, the authors used a CNN model with encoder-decoder and skip connection for segmenting the cracks. Zu et al. utilized a dataset of 46,632 images and achieved 98% accuracy after training their model with the dataset.
In [110], Yang et al. proposed a novel method named Feature pyramid and Hierarchical Boosting Network (FPHBN) for detecting cracks on pavement images. The authors designed the model with bottom-up convolutional layers, which are basically the first five layers of the VGG architecture, a feature pyramid pooling module for extracting context information of different levels, deconvolutional layers, and a hierarchical boosting module for reweighting the samples. Yang et al. trained their model with five different datasets (i.e., Crack500, GAPs384, CrackTree200, CFD, AgileRN) and introduced a new evaluation metric named AIOU. They also compared their model with HED, RCF, FCN, and CrackForest models and demonstrated that their model outperformed all of the models in terms of AIOU, ODS, and OIS for all of the datasets.
In [111], Ye et al. proposed an FCN model names Ci-Net for detecting cracks in concrete structures. In the feature extraction part, the authors used six convolutional layers and two pooling layers. On the other hand, Ye et al. utilized six deconvolutional layers and two upsampling layers in the decoder module for information restoration and generating predicted images. The authors trained their model with the images of the CrackForest and TITS2016 datasets by employing the SGD algorithm. They demonstrated that their model achieved 84% precision, 82% recall, and 72.7% IoU. The authors also showed the model's superiority over the Canny edge detector and Sobel operator by visualizing the predicted images.
In [112], Zhang et al. presented an improved U-Net model named CrackNet for detecting concrete cracks at the pixel level. They proposed a total of four CrackUnet models (CrackNet7, CrackNet11, CrackNet15, CrackNet19) based on the number of convolutional layers. In this work, the authors utilized the CrackForest dataset and achieved 98.72% precision, 92.84% recall, and 95.44% F1-score. They compared the CrackUnet models and demonstrated that CrackUnet 19 performed the best, even performing better than the FCN model.
From the critical analysis and Tables 13-15, it can be seen that the researchers used a wide variety of datasets. Though most of them used their own collected private datasets, there are still a few public benchmark datasets. Table 16 presents a list of the datasets along with their access link, so that new researchers can easily find databases to start their research work in this field.    '-' denotes the paper did not provide the particular information. In this work, we have presented a bibliometric analysis as well as a critical analysis of a few selected papers related to image-based crack detection methods. During the bibliometric analysis, our target was to determine the research trends, influential authors, journals, publications, countries, important research terms, and collaboration patterns. We list our findings from the bibliometric analysis below. Refs. [35,38,39,43,44] are among the most influential publications of this research field. - The highly influential countries are China, the USA, Germany, and Japan. - The important research terms are crack detection, deep learning, damage detection, image processing, system algorithm, inspection, model, identification, and concrete.
In the critical analysis section of our work, we have classified the papers based on their utilized techniques and described the ins and outs of the papers. We present a list of our findings from the critical analysis below. Fine-tuning the deep learning architectures [66], using a transfer learning scheme [37], modifying deep learning architectures by adding convolutional layers [86], adding residual identity blocks [88], and removing pooling layers [87,99] can increase the accuracy for detecting cracks. -Modifying the deep learning models by integrating various modules, including the MDM module [75], SCSE module [80], ASPP module [58], and the attention mechanism [103] can also increase the performance of the DL methods for detecting cracks. -Utilizing of modules (i.e., SSE module [58], SWT module [83]) can also reduce the computational complexities and reduce the inference time of the DL model.

Future Research Direction
The previous section listed the findings of our study. Moreover, along with the findings, we have also determined a set of research scopes and directions for future researchers by analyzing the extracted DL-based papers.
-It is our understanding that the segmentation of concrete cracks using DL techniques is going to be an engrossing research topic in this field. Researchers can focus on developing and modifying benchmark DL methods for segmenting concrete cracks with better accuracy. The design and integration of attention mechanisms, the ASPP module, SSE module, SCSE module, and other modules can be a promising research topic for researchers, as several research papers showed that usage of the modules can increase accuracy. -Very few research works [58,83] took reducing computation complexity as well as inference time into consideration. This can be a prominent research direction in order to develop lightweight and fast DL models and deploy them in low-cost devices, as real-time crack monitoring is important. - Refs. [62][63][64]72] highlighted the presence of noise, such as shadow problems, shadings, contaminated backgrounds, road markings, rough surfaces, and variations in illumination, as challenging scenarios for detecting the cracks and provided solutions. However, more research should be carried out to develop robust models to tackle these issues. As a result, this can be pointed out as a huge research scope for new researchers. -Another important perspective is to take class imbalance problems into consideration, as in [50]. As only a few pixels in an image contain crack information, DL models are very likely to face the class imbalance problem, which may hamper the classification accuracy. As a result, it also should be a research concern for future researchers. -Though a few research works [49,53,81,101] already focused on this, there is still plenty to be researched in developing algorithms to extract the geometric information of cracks from the segmented images. As a consequence, the researchers will be able to monitor the length, width, area, and severity level of the cracks. - Collecting data for research is always a laborious task for researchers. New researchers in this field can reduce their efforts by putting their focus on collecting data using drones and vehicles, as in [66,82]. It could be more effective if the researchers follow the research direction of [5] and develop a robotic vehicle for both collecting data and detecting cracks in real time. -As the DL model is data-hungry method and it needs plenty of labeled images to be trained, researchers need to put a huge amount of effort into collecting images and labelling them. For solving these issues, ref. [79] comes up with an interesting solution, producing train images using a GAN simulator from only one sample image.
Refs. [74,107] showed methods for labelling the images automatically by developing semi-supervised techniques using adversarial networks. New researchers can devote their efforts in this direction, as it could create a revolution in the research of DL for crack detection by providing plenty of labeled data within a shorter time and with less labor.

Discussions and Conclusions
In this article, we have presented a literature review of the existing papers on IPTbased crack detection techniques. IPTs have proved themselves to be essential parts of crack detection research. This review article has provided both scientometric and critical analysis of the prevailing papers (within the first crucial decade in this specific area of research). Bibliometric review offers several advantages over traditional systematic reviews in research evaluation and analysis. It uses objective and quantitative measures, such as citation counts and H-index, to gauge research impact, processes large quantities of data from multiple sources, facilitates the timely identification of emerging trends, and presents complex data in visual formats. Additionally, it fosters transparency and reproducibility in research evaluation and analysis. Thus, bibliometric review is a potent tool for gauging research impact and affords researchers valuable insights into emerging trends, research gaps, and potential collaborations in their field. This work performs bibliometric analysis to determine the influential authors, publications, geographical locations, modern research trends, and possible future research directions in this field. This was carried out so that researchers can be familiarized with the pioneers in this field, can follow the prominent publications to gather knowledge, can be aware of the sources in which to publish their articles to receive more attention, and can be aware of possible future research directions in this field to pursue further research in this field. Based on the scientometric analysis conducted in this work, it is found that [35] is the most influential publications, "Computeraided Civil Infrastructure and Engineering" is the most popular among the journals, and Ying Chen, Zhong Qu, and Weigang Zou are among the pioneer researchers in this field. It can be seen that among various technologies, DL-based techniques have contributed to a booming spike in the prosperity curve of crack detection applications. Furthermore, we have found that DL is considered the most modern technology today from the keyword timeline analysis.
Furthermore, this work has presented a thorough survey on a few scrutinized DLbased papers and also abridged many essential insights of the papers. Moreover, some captivating directive research questions have been yielded as an annex to the primary findings from the reviewed articles. This article articulates the answers to the questions related to the robustness and viability of various papers or DL techniques in this research area. In addition to this, this research work enlists a few benchmark datasets extracted from the DL-based paper along with their links so that new researchers can easily find necessary data to start their research in this field. It is our understanding that the segmentation of concrete cracks using DL techniques is going to be an engrossing research topic in this field. By means of feasible outcomes and practical application, segmentation can spearhead DL practice in concrete crack detection. It would be a rational move for the researchers to channel their research work toward crack segmentation utilizing DL techniques. The researchers should focus on developing modified DL architectures, integrating various modules, and introducing loss functions to increase the pixel segmentation accuracy. In addition to this, researchers should focus on reducing the computational complexity in order to implement the DL models on low-cost devices for real-time monitoring. Furthermore, it would be beneficial if they used segmented pictures to extract the geometric features of the cracks. We hope that researchers from both academia and industry will receive enough critical information and knowledge on DL-based crack detection techniques from this work that they will be able to contribute to this domain by incorporating this information into their research works.