Next Article in Journal
Research on the Design Method of 3D Parts Library of Prefabricated Concrete Composite Wall-Slab System Based on BIM
Next Article in Special Issue
Influence of Interlayer Bonding Conditions Between Base and Surface Layers on Structural Mechanics Response of Asphalt Pavements
Previous Article in Journal
Bridging the Construction Productivity Gap—A Hierarchical Framework for the Age of Automation, Robotics, and AI
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Large AI Models for Building Material Counting Task: A Comparative Study

1
College of Civil Engineering and Architecture, Xinjiang University, Urumqi 830047, China
2
China United Engineering Co., Ltd., Hangzhou 310052, China
3
College of Civil Engineering, Tongji University, Shanghai 200092, China
*
Author to whom correspondence should be addressed.
Buildings 2025, 15(16), 2900; https://doi.org/10.3390/buildings15162900
Submission received: 3 July 2025 / Revised: 13 August 2025 / Accepted: 14 August 2025 / Published: 15 August 2025
(This article belongs to the Special Issue The Application of Intelligence Techniques in Construction Materials)

Abstract

The rapid advancement of general large models has significantly impacted and introduced new concepts to the traditional “one task, one model” research paradigm in construction automation. In this paper, we evaluate the performance of existing large models and those developed on large model platforms, using building material counting as an example. We compare three categories of large AI models for building material counting, including multimodal large models, purely visual large models, and secondary models developed on platforms. Through this research, we aim to explore the accuracy and practicality of these models in real-world construction scenarios. The results indicate that directly applying general large models faces challenges in processing photos with complex shapes or backgrounds, failing to provide accurate counting results. Additionally, while purely visual large models excel in instance segmentation tasks, their application to the specific counting of building materials requires additional programming work. To address these issues, this study explores solutions based on large model secondary development platforms and trains a model using EasyDL as an example. Leveraging deep learning techniques, this model achieves effective counting of building materials through five steps: data preparation, model type selection, model training, model validation, and model deployment. Although models developed based on large model platforms are presently less accurate than specialized models, they still represent a highly promising approach.

1. Introduction

The construction industry plays a crucial role in global socio-economic development and urbanization processes [1]. Building materials such as steel bars, steel pipes, I-beams, channel steel, angle irons, and wooden beams form the foundation of construction projects and directly impact the safety performance and lifespan of buildings. Simultaneously, due to their diverse types and large quantities, building materials pose high demands on the precision and efficiency of material management. Therefore, enhancing the management level of building materials, particularly through the realization of intelligent identification and automated detection of construction materials [2], not only helps ensure project progress and quality, achieving cost reduction and efficiency improvement for enterprises, but also aligns with the trend of digital and intelligent transformation in the construction industry [3]. Currently, building material counting methods based on deep learning object detection technology have achieved significant results, especially in image analysis [4,5]. Nevertheless, common issues with all these methods include the following: (1) their heavy reliance on self-developed datasets of target building materials, which require extensive human and material resources for annotation; (2) the need for a dedicated database for each type of building material (such as rebar, steel pipes, I-beams, etc.); (3) the necessity for a specialized counting model for each material type due to its distinct cross-sectional characteristics, i.e., a “one material, one model” or “one task, one model” solution strategy. These issues result in the need to repeat the steps of data collection, annotation, model training, and deployment for each new material, making the process cumbersome and costly.
At the end of 2022, OpenAI launched the Chat Generative Pre-trained Transformer (ChatGPT) [6], a large pre-trained language model based on the Transformer architecture [7]. This model not only demonstrates powerful contextual understanding and high-quality text generation capabilities, but can also handle complex tasks such as code writing, logical reasoning, and literary creation [8]. Upon its release, the product quickly became a global sensation, highlighting the immense developmental potential of AI technology. It has since been widely applied in various vertical industries, including education, healthcare, finance, business operations, and everyday office work [9]. Additionally, supported by massive amounts of data and large-scale computing power, the model’s parameter count has grown significantly, allowing for more diverse input modalities. It has expanded from single-text information to multi-modal inputs, including text, images, audio, and video [10]. This comprehensive and rich information perception enables the model to understand user needs more accurately and deeply. Naturally, the following question arises: Can large AI models ultimately solve the problem of building material counting?
To answer this question, we aim to investigate the direct application of large AI models in building material counting tasks. By evaluating the performance of existing large AI models when directly applied to these tasks, we seek to determine whether large-scale models can replace the traditional “one task, one model” strategy. We compare the performance of different types of models: multimodal models, purely visual models, and secondary models developed on platforms. We hope this study will provide the construction industry with a reference guide for selecting appropriate AI technologies to improve material management efficiency. In this context, we organize the paper as follows. Section 2 summarizes the currently available large AI models. Section 3 records the application of these models to the task of building material counting and evaluates their performance. Section 4 discusses the development of new models based on the large AI models and presents the results. Finally, Section 5 concludes this paper with several important findings.

2. A Survey of Large AI Models

Currently, large AI models that are available for building material counting can be classified into three categories: multimodal large models, purely visual large models, and platforms on which secondary models can be developed. We investigated and compiled the three types of large models above as of 1 May 2024. Due to the rapid development in this field, there will certainly be newer models that are not included in the following summary tables.

2.1. Multimodal Large Models

A multimodal large model refers to a neural network model capable of handling multiple types of input and possessing billions or even more parameters. It can process various types of data, including text, audio, images, and videos. Unlike traditional deep learning models that can only handle a single type of data, multimodal large models can integrate different types of data, extracting and combining information to achieve better prediction and reasoning [11]. Table 1 summarizes twenty multimodal large models we have collected. The table also provides information on the names of these large models, their developers, release dates, parameter counts, main architectures, whether they are open source, web addresses, and their primary application areas.

2.2. Purely Visual Large Models

A purely visual large model focuses on processing and analyzing visual data, such as images and video materials. Leveraging its vast number of parameters, the model can capture and learn complex visual patterns, performing various visual tasks including image recognition, classification, object detection, semantic segmentation, pose estimation, and image generation [32]. The performance of a purely visual large model is highly dependent on annotated data. Only with training and optimization based on large amounts of high-quality data can the model achieve high-precision visual recognition and analysis. Table 2 shows five purely visual large models along with their names, developers, and web addresses.

2.3. Large Model Platforms

The abovementioned large models listed in Table 1 and Table 2 are typically more suitable for common scenarios and general tasks. Their practical effectiveness in specific industry-targeted scenarios might be limited due to factors such as the scarcity of publicly available data, the sensitivity of industry data, and insufficient local deployment capabilities. To address these challenges, several AI giants have launched platforms where specialized models can be developed by leveraging pre-trained large models, for example, Vertex AI [38] and EasyDL [39]. Vertex AI, launched by Google, is a comprehensive machine learning platform that integrates data engineering, data science, and machine learning engineering for training and deploying machine learning models and AI applications. EasyDL, launched by Baidu, is a low-code, user-friendly platform for training and serving AI models. It allows users to quickly build and deploy machine learning models through a simplified interface and minimal code. Because the EasyDL platform provides an end-to-end AI solution, covering the entire process from model design to deployment, we selected it as a representative platform to develop a secondary model for building material counting.

3. Performance Evaluation

3.1. Multimodal Large Models

We selected 10 top-ranked large models from those listed in Table 1 for material counting tests. As shown in Table 3, the final selection was primarily based on the ranking results from authoritative leaderboards, including Hugging Face’s Open LLM and Shanghai AI Laboratory’s OpenCompass [40,41,42]. These leaderboards provide comprehensive quantitative evaluations for large models across various dimensions such as knowledge, language understanding, reasoning, mathematics, and coding. While the rankings mainly focus on open-source models, we also included several well-known closed-source models due to their strong performance and significant impact on the industry. Additionally, this helps to understand the gap between open-source and closed-source models in the field of object detection.
To evaluate the performances of these selected models, we conducted a series of question-and-answer tests using photos of building materials, as shown in Figure 1. Each photo contains a bulk of certain building material, e.g., rebar, square steel pipes, I-beams and wooden beams. It is worth emphasizing that all the test photos were taken from real construction sites. Initially, we asked each model two questions: (1) What building material is in the photo? (2) How many building materials are there? We found that for materials with complex shapes or when presented in photos with complicated backgrounds, most models were unable to accurately answer the first question. Therefore, we standardized the question to focus on the quantity of known materials, such as “How many steel bars are in the photo”.
Notably, to better utilize the models’ capabilities, we also optimized prompting strategies such as Chain of Thought [43], few-shot [44], and AI-based [45] methods. However, in practical operations, we found that different images and tasks require continuously enriching the prompts and adjusting their order. This makes the process more tedious and significantly increases the time required for model inference, which goes against our aim of achieving rapid material counting. Therefore, we will only present the test results based on the most basic prompting strategies in this paper.
To ensure the reproducibility and consistency of the experiments, we set the “temperature” parameter for all models to 0, which guarantees that the same images produce identical results across multiple tests. Using this setting, we conducted tests on the photos, and the typical results are presented in Table 3. The test results indicate that even for photos where the materials are arranged in a relatively regular and orderly manner, all large models fail to provide accurate counting results.

3.2. Purely Visual Large Models

The purely visual large models cannot be directly used to count the number of building materials in a photo. Instead, we tested the ability of those models to segment objects from photos. Among the five purely visual large models in Table 2, we chose the Segment Anything Model (SAM) for this evaluation due to its state-of-the-art performance in instance segmentation tasks and its support for multiple interaction strategies.
Three different interaction strategies were adopted to test the segmentation performance of SAM: interactive point selection, interactive box selection, and automatic segmentation. The specific operations for each strategy are as follows: (1) Interactive Point Selection: Input photos and select interactive points on the target building materials; SAM is then guided to recognize and segment them; (2) Interactive Box Selection: Use a rectangular box to select an area containing the target building materials; the model will recognize and segment based on this selected area; (3) Automatic Segmentation: without relying on any manual interaction, the model automatically recognizes and segments the rebar, circular steel tubes, square steel pipes, I-beams, wooden beams, and wheel fasteners in the photos.
The segmentation results of SAM are shown in Table 4. The results using the click and box selection strategy are more similar to those of semantic segmentation, which treats stacked building materials as a whole to generate a mask output. In contrast, while the automatic segmentation technology can successfully distinguish independent masks for individual building materials stacked together, it tends to have a higher rate of missed errors when dealing with high-density arranged building materials.
The nature of these large purely visual models is object segmentation, without distinguishing between steel and other items. Although the segmentation results for many cases seem good by visual observation, additional programming work is still required to “identify” which are steel from the segmented data and then count their real numbers, resulting in a decrease in work efficiency.

4. Secondary Model Developed Based on EasyDL

Since in Section 3, multimodal large models and purely visual large models did not meet the expected performance in the task of counting building materials using images, this section will explore the development of secondary models using the EasyDL platform, aiming to test the actual effect of secondary developed models that adopt pre-trained models and transfer learning techniques with a small amount of data.

4.1. Brief Introduction of EasyDL

The development of deep learning models is a complex and tedious process, including, but not limited to, software environment setup, model selection, hyperparameter tuning, model training, and model deployment. To simplify the AI development process, Baidu launched the EasyDL platform in November 2017. This platform aims to lower the technical barrier and provide developers with integrated AI development services. EasyDL integrates core technologies from Baidu’s PaddlePaddle framework and offers a series of carefully selected pre-trained models. These models are trained on Baidu’s large-scale multimodal datasets and cover both visual and textual domains. Through transfer learning, the EasyDL platform can support effective model training even with limited data, achieving over 90% detection accuracy in some object detection tasks.
For researchers with experience in AI model development, we recommend the PaddlePaddle EasyDL Desktop Edition. This version provides two primary modeling approaches: adjusting preset model parameters and custom modeling using Notebooks. These features enable developers to adjust models based on their own experience, allowing them to obtain solutions which are better suited for specific application scenarios. Additionally, users can export trained models along with detailed configuration information during the deployment phase. Notably, the EasyDL Desktop Edition includes the ability to convert graphical user interface (GUI) operations into code. We have found that this capability not only enhances the reproducibility of the entire workflow, but also offers greater potential for subsequent optimization of model performance.

4.2. Development Workflow of Secondary Model Based on EasyDL

The following five steps are necessary to develop a specific model based on EasyDL.
Step 1: Data preparation. To start a new job, users can create a dataset and upload data to it. EasyDL also provides an automatic data augmentation function to perturb and expand the data to some extent.
Step 2: Model type selection. Currently, the EasyDL Image Processing category supports the development of three types of secondary AI models: image classification, object detection, and image segmentation. Here, we chose to create an “Object Detection” model for counting building materials. For fair comparison purposes, we utilized the same dataset as Li et al. [46], which contains 74,824 rebar sections from real construction sites, without any data augmentation.
Step 3: Model training. Users can select from a variety of pre-trained models offered by EasyDL, tailored to meet specific application requirements such as accuracy and inference speed. The model can be trained using either the default parameters or customized settings, including learning rate, batch size, and number of iterations. Throughout the training process, users can monitor real-time performance metrics, such as accuracy, recall, and loss function values, to track the model’s progress.
Step 4: Model validation. Users can assess the model’s performance through the final evaluation report or validation process. If the results are unsatisfactory, users have the option to iterate on the model by expanding the dataset or refining annotations to improve performance.
Step 5: Model deployment. Upon completing the online model training and validation, users can deploy the model by choosing a customized method, such as public cloud deployment, local server deployment, deployment on general-purpose small devices, integrated hardware–software solutions, or deployment via browsers or mini-programs. Once the model is published, it becomes accessible to other users through an API connection.

4.3. Performance Comparison for Rebar Detection

Li et al. [46] proposed a rebar real-time detection model based on YOLOv3 in their paper published in “Automation in Construction” [46]. They implemented three key improvements to the original YOLOv3 network: (1) adding an extra feature pyramid network to enhance the recognition capability for objects of different scales; (2) utilizing a combination of Intersection over Union (IoU) loss and focal loss functions to optimize localization accuracy and classification performance; and (3) integrating a series of performance enhancement strategies (bag of freebies) without additional cost. These enhancements resulted in an average precision of 99.71% for rebar detection at IoU = 0.5.
For fair comparison purposes, this study utilized the same dataset as Li et al. [46] without any data augmentation, which contains 74,824 rebar sections from real construction sites. Moreover, we adopted a standard training method and a high-performance general algorithm for model training in EasyDL. The configuration utilized default settings, with automatic hyperparameter search and advanced training options disabled. The training environment consisted of a single Tesla GPU P4 with 8 GB of memory, a 12-core CPU, and 40 GB of RAM, providing a total computing power of 5.5 TeraFLOPS. Using the same dataset, the model trained on the EasyDL platform achieved the highest accuracy of 90.96% with an IoU of 0.5 (Figure 2). Table 5 presents examples of the model on the validation set, where correctly identified rebars are marked with a green mask, incorrectly identified rebars with a red mask, and missed rebars with an orange mask. The detection results of Li’s model are also shown. It is clear that Li’s model performs better than the secondary AI model that we developed using EasyDL.
One interesting feature of EasyDL is that it provides specific optimization suggestions to further improve the model’s performance. For the rebar detection model, EasyDL first identified the affected performance metrics and their importance levels and conducted an in-depth root cause analysis, pointing out that color bias, saturation changes, and target box size were the main factors affecting model accuracy and causing false detections. These factors were quantified within different characteristic ranges and corresponding optimization strategies were proposed in Table 6. For instance, it suggests adjusting the color processing methods in the data augmentation strategy (e.g., using “Color, Posterize” and simple “Color” adjustments) to mitigate color bias and saturation issues. It also suggests exploring higher precision models, implementing small object detection techniques, or using Business Model Logic (BML) for more optimization strategies to reduce accuracy fluctuations caused by target size. After adopting data augmentation strategies and training with a higher precision model, the model’s AP improved to 93.36%, a 2.46% increase compared to before. This indicates that the optimization suggestions provided by EasyDL are effective, enabling the model to achieve better performance in rebar detection tasks.

4.4. Performance Comparison for Other Building Materials

Following the same procedure, we developed secondary AI models based on EasyDL to count various building materials, including circular steel tubes, square steel tubes, I-beams, wooden beams, and wheel fasteners. For each material, datasets collected by the author’s group were used, and the performance of the final models was compared with those developed using the YOLO framework by the author’s group on the same datasets [47,48]. Table 7 illustrates some examples. For the five materials considered here, the EasyDL models demonstrate an AP50 (Average Precision at IoU = 0.50) accuracy spanning from 86.8% to 98.8%, whereas the models developed by the author’s team exhibit an accuracy range of 91.4% to 99.4%. The comparison indicates that AI counting models specifically developed for each type of building material outperform the models secondarily developed based on EasyDL. Nevertheless, the secondary developed models also demonstrated high counting accuracy and convenient development and deployment efficiency.

4.5. Discussion

In this section, we developed and trained building material counting models on the EasyDL platform using large-scale AI models. While our experimental results show that these models demonstrate a certain level of generalization and adaptability, there is still room for improvement in terms of accuracy for specific application scenarios. The following sections will discuss the strengths, practical significance, and limitations of this research.

4.5.1. Advantages and Practical Significance of the Research

Our research found that using pre-trained large-scale models for transfer learning is highly effective. This is especially true in situations where construction site conditions are complex, and it is difficult to collect a large number of building material data samples that meet training requirements. By applying transfer learning techniques, we were able to achieve high model performance even with limited data. This approach offers a more convenient and efficient solution for scenarios lacking large training datasets and AI expertise.
Additionally, by utilizing cloud-based platforms, researchers can focus on algorithm optimization without dealing with the complexities and costs of hardware infrastructure and maintenance. This approach not only accelerates the research and development process but also simplifies model updates. Consequently, in industries like construction, where rapid adaptation to market changes is crucial, we believe this cloud-based development model offers significant potential.

4.5.2. Limitations and Future Directions

Despite the model’s excellent scalability and ease of use, its accuracy may not meet the required standards when confronted with common machine vision challenges, such as occlusion or poor lighting conditions. Furthermore, reliance on existing datasets limits the system’s generalization ability, particularly when encountering new types of building materials, leading to poor recognition performance. To address these challenges, future research could focus on the following areas:
  • Data Augmentation and Quality Improvement: Increase the diversity of samples to cover different types of building materials and ensure sufficient sample sizes for each category. Additionally, data preprocessing techniques, such as applying data augmentation, could be improved to enhance the model’s robustness in diverse conditions.
  • Model Architecture Innovation: Explore new network architectures better suited for the task of counting building materials. For instance, incorporating attention mechanisms and other advanced feature extraction techniques could improve the accuracy of key region identification.
  • Automatic Online Updates: Utilize new data collected from the deployed models to continuously update and refine the existing model, thereby gradually improving its prediction accuracy.

4.5.3. Ethical and Operational Risks Reminders

Although secondary transfer learning based on large pre-trained models can significantly improve modeling efficiency, two types of risks should be noted when deploying on construction sites:
  • Data leakage risks: Original images uploaded to cloud platforms may contain project-sensitive information. It is recommended to blur or crop QR codes, project nameplates, background buildings, etc., in the images before uploading.
  • Over-reliance risks: Large models still have missed detections and misjudgments in counting regular materials such as steel bars and wooden beams. Therefore, the model output results must undergo manual spot checks and reviews by on-site material staff. Especially in key business links involving settlement and payment, a “model initial counting, manual verification, difference tracing” closed-loop process should be established to ensure the accuracy of quantities.

5. Concluding Remarks

In this paper, we collected as many currently available deep learning large models as possible, tested their effectiveness in the problem of counting building materials, and evaluated the performance of counting models developed based on these large models. Our study indicates that although existing large models exhibit strong adaptability for multiple materials, there are still significant deficiencies in counting accuracy, making them unsuitable for direct application. The accuracy of the secondary counting model developed based on EasyDL is slightly lower than that of meticulously fine-tuned specialized models, but it demonstrates excellent scalability, updatability, development, and deployment efficiency, showing potential to become the main modeling approach in the future. Additionally, the current research mainly focuses on the accuracy evaluation of image-based counting. In future work, we will explore the integration of multimodal information, such as text, images, audio, and video, to further enhance the accuracy of building material counting.

Author Contributions

Conceptualization, J.C. and Y.C.; methodology, Y.L., J.C. and Y.C.; software, S.L., Q.H. and Z.F.; validation, S.L., Y.C., Q.H. and Z.F.; formal analysis, Y.C. and J.C.; investigation, Y.L., Y.C. and Q.H.; writing—original draft preparation, Y.C.; writing—review and editing, J.C. and Y.L.; supervision, J.C. and Y.L.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 52178151) and Tongji University Cross-Discipline Joint Program (2022-3-YB-06).

Data Availability Statement

Some or all data, models, or codes supporting the findings of this study are available from the corresponding authors upon reasonable request. The datasets are available online at https://github.com/H518123 (accessed on 12 August 2025) after the paper is published.

Conflicts of Interest

Author Yang Li was employed by the company China United Engineering Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. You, Z. Intelligent construction: Unlocking opportunities for the digital transformation of China’s construction industry. Eng. Constr. Archit. Manag. 2024, 31, 1429–1453. [Google Scholar] [CrossRef]
  2. Baduge, S.K.; Thilakarathna, S.; Perera, J.S.; Arashpour, M.; Sharafi, P.; Teodosio, B.; Shringi, A.; Mendis, P. Artificial intelligence and smart vision for building and construction 4.0: Machine and deep learning methods and applications. Autom. Constr. 2022, 141, 104440. [Google Scholar] [CrossRef]
  3. Pan, Y.; Zhang, L. Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Autom. Constr. 2021, 122, 103517. [Google Scholar] [CrossRef]
  4. Liu, H.; Wang, D.; Xu, K.; Zhou, P.; Zhou, D. Lightweight convolutional neural network for counting densely piled steel bars. Autom. Constr. 2023, 146, 104692. [Google Scholar] [CrossRef]
  5. Shin, Y.; Heo, S.; Han, S.; Kim, J.; Na, S. An image-based steel rebar size estimation and counting method using a convolutional neural network combined with homography. Buildings 2021, 11, 463. [Google Scholar] [CrossRef]
  6. Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. arXiv 2022, arXiv:2203.02155. [Google Scholar] [CrossRef]
  7. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  8. Floridi, L.; Chiriatti, M. GPT-3: Its nature, scope, limits, and consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
  9. Yenduri, G.; Ramalingam, M.; Selvi, G.C.; Supriya, Y.; Srivastava, G.; Maddikunta, P.K.R.; Raj, G.D.; Jhaveri, R.H.; Prabadevi, B.; Wang, W.; et al. GPT (Generative Pre-Trained Transformer)—A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access 2024, 12, 54608–54649. [Google Scholar] [CrossRef]
  10. Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.-L.; Tang, Y. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
  11. Xu, P.; Zhu, X.; Clifton, D.A. Multimodal learning with transformers: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12113–12132. [Google Scholar] [CrossRef] [PubMed]
  12. Chen, Z.; Wang, W.; Tian, H.; Ye, S.; Gao, Z.; Cui, E.; Tong, W.; Hu, K.; Luo, J.; Ma, Z.; et al. How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites. Sci. China Inf. Sci. 2024, 67, 220101. [Google Scholar] [CrossRef]
  13. Guo, Z.; Xu, R.; Yao, Y.; Cui, J.; Ni, Z.; Ge, C.; Chua, T.-S.; Liu, Z.; Huang, G. LLaVA-UHD: An LMM Perceiving Any Aspect Ratio and High-Resolution Images. In Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2025; Volume 15141, pp. 390–406. [Google Scholar] [CrossRef]
  14. OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
  15. Bai, J.; Bai, S.; Yang, S.; Wang, S.; Tan, S.; Wang, P.; Lin, J.; Zhou, C.; Zhou, J. Qwen-VL: A versatile vision-language model for understanding, localization, text reading, and beyond. arXiv 2023, arXiv:2308.12966. [Google Scholar]
  16. Shan, B.; Yin, W.; Sun, Y.; Tian, H.; Wu, H.; Wang, H. ERNIE-ViL 2.0: Multi-view contrastive learning for image-text pre-training. arXiv 2022, arXiv:2209.15270. [Google Scholar]
  17. Wang, W.; Lv, Q.; Yu, W.; Hong, W.; Qi, J.; Wang, Y.; Ji, J.; Yang, Z.; Zhao, L.; Xuan, S. CogVLM: Visual expert for pretrained language models. Adv. Neural Inf. Process. Syst. 2024, 37, 121475–121499. [Google Scholar]
  18. Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. arXiv 2021, arXiv:2103.00020. [Google Scholar] [CrossRef]
  19. Li, Z.; Yang, B.; Liu, Q.; Ma, Z.; Zhang, S.; Yang, J.; Sun, Y.; Liu, Y.; Bai, X. Monkey: Image resolution and text label are important things for large multi-modal models. arXiv 2023, arXiv:2311.06607. [Google Scholar]
  20. Girdhar, R.; El-Nouby, A.; Liu, Z.; Singh, M.; Alwala, K.V.; Joulin, A.; Misra, I. Imagebind: One embedding space to bind them all. arXiv 2023, arXiv:2305.05665. [Google Scholar] [CrossRef]
  21. Gemini Team Google. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv 2024, arXiv:2403.05530. [Google Scholar] [CrossRef]
  22. AI. Yi: Open foundation models by 01.AI. arXiv 2024, arXiv:2403.04652.
  23. Hu, J.; Yao, Y.; Wang, C.; Wang, S.; Pan, Y.; Chen, Q.; Yu, T.; Wu, H.; Zhao, Y.; Zhang, H.; et al. Large multilingual models pivot zero-shot multimodal learning across languages. arXiv 2023, arXiv:2308.12038. [Google Scholar]
  24. XVERSE-V-13B. Available online: https://github.com/xverse-ai/XVERSE-V-13B (accessed on 3 July 2025).
  25. Li, J.; Li, D.; Savarese, S.; Hoi, S. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv 2023, arXiv:2301.12597. [Google Scholar]
  26. Huo, Y.; Zhang, M.; Liu, G.; Lu, H.; Gao, Y.; Yang, G.; Wen, J.; Zhang, H.; Xu, B.; Zheng, W.; et al. WenLan: Bridging vision and language by large-scale multi-modal pre-training. arXiv 2021, arXiv:2103.06561. [Google Scholar]
  27. Hunyuan AI. Available online: https://hunyuan.tencent.com (accessed on 3 July 2025).
  28. Sun, Q.; Wang, J.; Yu, Q.; Cui, Y.; Zhang, F.; Zhang, X.; Wang, X. EVA-CLIP-18B: Scaling CLIP to 18 billion parameters. arXiv 2024, arXiv:2402.04252. [Google Scholar]
  29. Ye, Q.; Xu, H.; Ye, J.; Yan, M.; Hu, A.; Liu, H.; Qian, Q.; Zhang, J.; Huang, F.; Zhou, J. mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration. arXiv 2023, arXiv:2311.04257. [Google Scholar]
  30. Xinghuo AI. Available online: https://xinghuo.xfyun.cn (accessed on 3 July 2025).
  31. The Claude 3 Model Family: Opus, Sonnet, Haiku. Available online: https://www.anthropic.com/news/claude-3-haiku?ref=ai-recon.ghost.io (accessed on 3 July 2025).
  32. Wang, J.; Liu, Z.; Zhao, L.; Wu, Z.; Ma, C.; Yu, S.; Dai, H.; Yang, Q.; Liu, Y.; Zhang, S.; et al. Review of large vision models and visual prompt engineering. Meta-Radiology 2023, 1, 100047. [Google Scholar] [CrossRef]
  33. Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment anything (SA) project: A new task, model, and dataset for image segmentation. arXiv 2023, arXiv:2304.02643. [Google Scholar]
  34. Ke, L.; Ye, M.; Danelljan, M.; Liu, Y.; Tai, Y.-W.; Tang, C.-K.; Yu, F. Segment anything in high quality. arXiv 2023, arXiv:2306.01567. [Google Scholar] [CrossRef]
  35. Zhao, X.; Ding, W.; An, Y.; Du, Y.; Yu, T.; Li, M.; Tang, M.; Wang, J. Fast segment anything. arXiv 2023, arXiv:2306.12156. [Google Scholar] [CrossRef]
  36. Wang, X.; Zhang, X.; Cao, Y.; Wang, W.; Shen, C.; Huang, T. SegGPT: Segmenting everything in context. arXiv 2023, arXiv:2304.03284. [Google Scholar] [CrossRef]
  37. Zou, X.; Yang, J.; Zhang, H.; Li, F.; Li, L.; Wang, J.; Wang, L.; Gao, J.; Lee, Y.J. Segment everything everywhere all at once. arXiv 2023, arXiv:2304.06718. [Google Scholar] [CrossRef]
  38. Google Cloud Vertex AI. Available online: https://cloud.google.com/vertex-ai (accessed on 3 July 2025).
  39. Baidu AI EasyDL. Available online: https://ai.baidu.com/easydl (accessed on 3 July 2025).
  40. Hugging Face Open LLM Leaderboard. Available online: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard (accessed on 3 July 2025).
  41. Shanghai AI Laboratory OpenCompass Leaderboard. Available online: https://rank.opencompass.org.cn/home (accessed on 3 July 2025).
  42. Zhang, D.; Yu, Y.; Dong, J.; Li, C.; Su, D.; Chu, C.; Yu, D. MM-LLMs: Recent advances in multimodal large language models. arXiv 2024, arXiv:2401.13601. [Google Scholar]
  43. Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2022, arXiv:2201.11903. [Google Scholar]
  44. Wang, Z.; Sun, Q.; Zhang, B.; Wang, P.; Zhang, J.; Zhang, Q. PM2: A new prompting multi-modal model paradigm for few-shot medical image classification. arXiv 2024, arXiv:2404.08915. [Google Scholar]
  45. Automatically Generate First Draft Prompt Templates Anthropic. Available online: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prompt-generator (accessed on 3 July 2025).
  46. Li, Y.; Lu, Y.; Chen, J. A deep learning approach for real-time rebar counting on the construction site based on YOLOv3 detector. Autom. Constr. 2021, 124, 103602. [Google Scholar] [CrossRef]
  47. Chen, J.; Chen, W.; Li, Y. Intelligent real-time counting of construction materials based on object detection. J. Tongji Univ. (Nat. Sci. Ed.) 2023, 51, 1701–1710. [Google Scholar] [CrossRef]
  48. Chen, J.; Huang, Q.; Chen, W.; Li, Y.; Chen, Y. Automated counting of steel construction materials: Model, methodology, and online deployment. Buildings 2024, 14, 1661. [Google Scholar] [CrossRef]
Figure 1. Test images of different types of construction materials: (a) cross-sections of rebars, (b) cross-sections of square steel pipes, (c) cross-sections of I-beams, (d) cross-sections of wooden beams.
Figure 1. Test images of different types of construction materials: (a) cross-sections of rebars, (b) cross-sections of square steel pipes, (c) cross-sections of I-beams, (d) cross-sections of wooden beams.
Buildings 15 02900 g001
Figure 2. F1-score performance under different IoU thresholds.
Figure 2. F1-score performance under different IoU thresholds.
Buildings 15 02900 g002
Table 1. Comparison of popular multimodal large artificial intelligence models.
Table 1. Comparison of popular multimodal large artificial intelligence models.
Model NameDeveloperRelease DateTraining
Dataset
Number of Parameters (Billion)Main
Architecture
Implementation StrategyOpen Source StatusApplication FieldsURL(Uniform Resource Locator)
InternVL-Chat-V1.5
[12]
Shanghai AI Laboratory2024.04High-quality bilingual dataset covering common scenes and document images25.5InternViT-6B-448 px-V1-5 + MLP + InternLM2-Chat-20BPretraining stage: ViT + MLP and Supervised fine-tuning stage: ViT + MLP + LLMOpen SourceVisual question answering, Character recognition, Real-world understandinghttps://huggingface.co/OpenGVLab/InternVL-Chat-V1-5, accessed on 12 August 2025.
LLaVA-UHD
[13]
Tsinghua University2024.03CC-595 K and 656 K mixture datasetN/ACLIP-ViT-L/14 + Perceiver Resampler + Vicuna-13BModularized Visual Encoding + Compression of Visual Tokens + Spatial SchemaOpen SourceVisual question answering, Optical character recognition, Vision-language understanding taskshttps://github.com/thunlp/LLaVA-UHD, accessed on 12 August 2025.
GPT-4
[14]
OpenAI2023.03Web pages, books, articles, and conversationsExceeds GPT-3′s 175Transformer + Mixture of Experts (MoE) + Self-AttentionLarge-scale pre-training + Fine-tuningClosed SourceNatural language processing, Dialogue systems, Content generation https://chatgpt.com, accessed on 12 August 2025.
Qwen-VL
[15]
Alibaba Cloud2023.08Multilingual multimodal cleaned corpus9.6ViT + Qwen-7B + Vision-Language AdapterMulti-task Pretraining + Supervised Fine-tuningOpen SourceImage captioning, Visual question answering, Groundinghttps://github.com/QwenLM/Qwen, accessed on 12 August 2025.
ERNIE-ViL 2.0
[16]
Baidu2022.0929 M public datasets (English), 1.5B Chinese image-text pairsN/AEfficientNet-L2 + BERT-large + ViT + ERNIEMulti-View Contrastive Learning FrameworkOpen SourceCross-modal retrieval, Visual question answering, Multimodal representation learninghttps://github.com/PaddlePaddle/ERNIE, accessed on 12 August 2025.
CogVLM
[17]
Tsinghua University, Zhipu AI2024.01LAION-2B, COYO-700 M, and a visual grounding dataset of 40 M images17ViT + MLP + GPT + Visual Expert ModuleTrainable visual expert module to deeply fuse vision and language featuresOpen SourceImage captioning, Visual question answering, Visual groundinghttps://github.com/THUDM/CogVLM, accessed on 12 August 2025.
CLIP
[18]
OpenAI2021.01400 million (image, text) pairs collected from the internetN/AViT + ResNet + TransformerContrastive Learning + Pre-trainingOpen SourceZero-shot transfer to various computer vision tasks https://github.com/OpenAI/CLIP, accessed on 12 August 2025.
Monkey
[19]
Huazhong University, Kingsoft2023.1119 different datasets, including 1.44 million samples9.8ViT-BigG + Qwen-VL + LoRAMulti-level description generation method + sliding window method Open SourceImage Captioning, Visual question answering, Document-oriented visual question answeringhttps://github.com/Yuliang-Liu/Monkey, accessed on 12 August 2025.
IMAGEBIND
[20]
Meta AI2023.05Image-text, video-audio, image-depth, image-thermal, video-IMUN/AAll modality encoders based on Transformer Leverages image-paired data for joint embedding across six modalitiesOpen SourceCross-modal retrieval, Zero-shot recognition, Few-shot recognitionhttps://facebookresearch.github.io/ImageBind, accessed on 12 August 2025.
Gemini 1.5 Pro
[21]
Google2024.02Multimodal and multilingual dataN/ASparse mixture-of-expert (MoE) TransformerJAX + ML Pathways + TPUv4 Distributed TrainingClosed SourceMultilingual translation, Multimodal long-context understandinghttps://gemini.google.com/app, accessed on 12 August 2025.
Yi-VL
[22]
01.AI2023.113.1 T tokens of English and Chinese corpora6/34Transformer + Grouped-Query Attention + SwiGLU + RoPETrain ViT and projection + Train higher resolution image + Joint train all modulesOpen SourceLanguage modeling, vision-language tasks, long-context retrieval, chat modelshttps://huggingface.co/01-ai, accessed on 12 August 2025.
MiniCPM-V-2_5
[23]
OpenBMB2024.05SigLip-400 M8Improved version based on Llama3-8B-InstructLoRA fine-tuning + GGUF format + quantization + NPU optimizationOpen SourceMultilingual support, Mobile deployment, Multimodal taskshttps://github.com/OpenBMB/MiniCPM-V, accessed on 12 August 2025.
XVERSE-V-13B
[24]
Shenzhen Yuanxiang2024.042.1 billion pairs of images-text and 8.2 million instruction data points13Clip-vit-large-patch14-224 + MLP + XVERSE-13B-ChatA large-scale multimodal pre-training + Fine-tuningOpen SourceVisual question answering, Character recognition, Real-world understandinghttps://www.modelscope.cn/models/xverse/XVERSE-V-13B, accessed on 12 August 2025.
BLIP-2
[25]
Salesforce Research2023.01Large dataset of 129 M images0.188 (Q-Former)Q-Former + Frozen Image Encoder + Frozen LLMsVision-language representation learning and generative learningOpen SourceVisual question answering, Image captioning, Image-Text retrievalhttps://github.com/salesforce/LAVIS/tree/main/projects/blip2, accessed on 12 August 2025.
BriVL
[26]
Renmin University of China, Chinese Academy of Sciences2021.0730 million image-text pairs1Two-tower architecture, including text encoder and image encoderContrastive learning enhanced with MoCo for managing large negative sample sets efficientlyOpen SourceImage-text retrieval, Captioning, Visual understandinghttps://github.com/BAAI-WuDao/BriVL, accessed on 12 August 2025.
Hunyuan
[27]
Tencent2023.09N/AN/AN/AN/AClosed SourceContent creation, Logical reasoning, Multimodal interactionhttps://hunyuan.tencent.com, accessed on 12 August 2025.
EVA-CLIP-18B
[28]
Beijing Academy of Artificial Intelligence2024.02Merged-2B, LAION-2B, COYO-700 M, LAION-COCO, Merged-video18Based on CLIP, utilizing both vision and language components.Weak-to-strong vision scaling + RMSNorm and LAMB optimizerOpen SourceImage classification, Video classification, Image-text retrievalhttps://github.com/baaivision/EVA/tree/master/EVA-CLIP-18B, accessed on 12 August 2025.
mPLUG-Owl2
[29]
Alibaba2023.11400 M image-text pairs8.2ViT-L + LLaMA-2-7BModality-adaptive modular network + Pre-trained with joint tuningOpen SourceMulti-modal tasks, Vision-language tasks, Pure-text taskshttps://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2, accessed on 12 August 2025.
IFlytek Spark V3.5
[30]
iFLYTEK2024.01N/AN/AN/AN/AClosed SourceMultilingual capability, Knowledge-based question answering, Text generationhttps://xinghuo.xfyun.cn, accessed on 12 August 2025.
Claude 3 Family
[31]
Anthropic2024.03Proprietary mix of public and non-public data as of Aug 2023N/AN/AUtilizes various training methods including unsupervised learning and Constitutional AIClosed SourceReasoning, Coding, Multi-lingual Understandinghttps://docs.anthropic.com, accessed on 12 August 2025.
Table 2. Comparison of purely visual large artificial intelligence models.
Table 2. Comparison of purely visual large artificial intelligence models.
Model NameDeveloperRelease DateNumber of Parameters (Million)URL (Uniform Resource Locator)
SAM
[33]
Meta AI2023.0468https://github.com/facebookresearch/segment-anything, accessed on 12 August 2025.
HQ-SAM
[34]
ETH Zürich, HKUST2023.10Slight increase over SAMhttps://github.com/SysCV/SAM-HQ, accessed on 12 August 2025.
FastSAM
[35]
Chinese Academy of Sciences2023.0668https://github.com/CASIA-IVA-Lab/FastSAM, accessed on 12 August 2025.
SegGPT
[36]
Beijing Academy of Artificial Intelligence2023.04150https://github.com/baaivision/Painter, accessed on 12 August 2025.
SEEM
[37]
Microsoft
research
2023.12N/Ahttps://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once, accessed on 12 August 2025.
Table 3. Test results of multimodal large artificial intelligence models for building material counting.
Table 3. Test results of multimodal large artificial intelligence models for building material counting.
Model NameRebars Counting Test Results
(Real number: 85)
Square Steel Pipes Counting Test Results
(Real number: 60)
I-beams Counting Test Results
(Real number: 60)
Wooden Beams Counting Test Results
(Real number: 60)
GPT-4Buildings 15 02900 i001Buildings 15 02900 i002Buildings 15 02900 i003Buildings 15 02900 i004
ERNIE BotBuildings 15 02900 i005Buildings 15 02900 i006Buildings 15 02900 i007Buildings 15 02900 i008
QwenBuildings 15 02900 i009Buildings 15 02900 i010Buildings 15 02900 i011Buildings 15 02900 i012
GLM-4Buildings 15 02900 i013Buildings 15 02900 i014Buildings 15 02900 i015Buildings 15 02900 i016
mPLUGBuildings 15 02900 i017Buildings 15 02900 i018Buildings 15 02900 i019Buildings 15 02900 i020
Spark DeskBuildings 15 02900 i021Buildings 15 02900 i022Buildings 15 02900 i023Buildings 15 02900 i024
GeminiBuildings 15 02900 i025Buildings 15 02900 i026Buildings 15 02900 i027Buildings 15 02900 i028
HunyuanBuildings 15 02900 i029Buildings 15 02900 i030Buildings 15 02900 i031Buildings 15 02900 i032
Claude3Buildings 15 02900 i033Buildings 15 02900 i034Buildings 15 02900 i035Buildings 15 02900 i036
MiniCPMBuildings 15 02900 i037Buildings 15 02900 i038Buildings 15 02900 i039Buildings 15 02900 i040
Table 4. Test results of various interaction methods based on the SAM.
Table 4. Test results of various interaction methods based on the SAM.
Interaction StrategyRebars BarsCircular Steel TubesSquare Steel PipesI-beamsWooden BeamsWheel Fasteners
Interactive point selectionBuildings 15 02900 i041Buildings 15 02900 i042Buildings 15 02900 i043Buildings 15 02900 i044Buildings 15 02900 i045Buildings 15 02900 i046
Interactive box selectionBuildings 15 02900 i047Buildings 15 02900 i048Buildings 15 02900 i049Buildings 15 02900 i050Buildings 15 02900 i051Buildings 15 02900 i052
Automatic segmentationBuildings 15 02900 i053Buildings 15 02900 i054Buildings 15 02900 i055Buildings 15 02900 i056Buildings 15 02900 i057Buildings 15 02900 i058
Table 5. Comparison of rebar detection results from different models.
Table 5. Comparison of rebar detection results from different models.
Sample ImageEasyDL ModelLi’s Model
Buildings 15 02900 i059
(Real number: 164)
Buildings 15 02900 i060
(False: 4, Missed: 10)
Buildings 15 02900 i061
(False: 0, Missed: 0)
Buildings 15 02900 i062
(Real number: 63)
Buildings 15 02900 i063
(False: 3, Missed: 0)
Buildings 15 02900 i064
(False: 0, Missed: 0)
Buildings 15 02900 i065
(Real number: 189)
Buildings 15 02900 i066
(False: 5, Missed: 2)
Buildings 15 02900 i067
(False: 0, Missed: 0)
Buildings 15 02900 i068
(Real number: 243)
Buildings 15 02900 i069
(False: 1, Missed: 13)
Buildings 15 02900 i070
(False: 0, Missed: 0)
Table 6. Optimization suggestions provided by EasyDL model.
Table 6. Optimization suggestions provided by EasyDL model.
No.Affected MetricImpact LevelRoot Cause AnalysisOptimization Strategy
1AccuracyHighColor bias has a significant impact on accuracy, with a variance of 0.0127 across different feature ranges.Configure “Color, Posterize” in [Add Data] → [Data Augmentation Strategy] for enhancement.
2AccuracyHighColor bias has a significant impact on miss rate, with a variance of 0.0127 across different feature ranges.Configure “Color, Posterize” in [Add Data] → [Data Augmentation Strategy] for enhancement.
3AccuracyHighSaturation has a significant impact on accuracy, with a variance of 0.0123 across different feature ranges.Configure “Color” in [Add Data] → [Data Augmentation Strategy] for enhancement.
4AccuracyHighSaturation has a significant impact on miss rate, with a variance of 0.0123 across different feature ranges.Configure “Color” in [Add Data] → [Data Augmentation Strategy] for enhancement.
5AccuracyHighTarget box size has a significant impact on accuracy, with a variance of 0.0116 across different feature ranges.Try higher-precision models, or try small object detection or more optimization strategies in Baidu Machine Learning (BML).
Table 7. Comparison of other building materials detection results between different models.
Table 7. Comparison of other building materials detection results between different models.
Sample ImageEasyDL ModelOther’s Model
Buildings 15 02900 i071
(Real number: 98)
Buildings 15 02900 i072
(False: 1, Missed: 12)
Buildings 15 02900 i073
(False: 0, Missed: 0)
Buildings 15 02900 i074
(Real number: 130)
(AP50: 86.85%)
Buildings 15 02900 i075
(False: 4, Missed: 8)
(AP50: 90.54%)
(AP50: 93.01%)
Buildings 15 02900 i076
(False: 0, Missed: 0)
(AP50: 91.41%)
Buildings 15 02900 i077
(Real number: 55)
Buildings 15 02900 i078
(False: 4, Missed: 0)
(AP50: 97.68%)
Buildings 15 02900 i079
(False: 0, Missed: 0)
(AP50: 99.48%)
Buildings 15 02900 i080
(Real number: 99)
Buildings 15 02900 i081
(False: 8, Missed: 2)
(AP50: 96.63%)
Buildings 15 02900 i082
(False: 0, Missed: 0)
(AP50: 97.06%)
Buildings 15 02900 i083
(Real number: 36)
Buildings 15 02900 i084
(False: 1, Missed: 1)
(AP50: 98.81%)
Buildings 15 02900 i085
(False: 0, Missed: 0)
(AP50: 99.40%)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Li, Y.; Liu, S.; Huang, Q.; Fan, Z.; Chen, J. Large AI Models for Building Material Counting Task: A Comparative Study. Buildings 2025, 15, 2900. https://doi.org/10.3390/buildings15162900

AMA Style

Chen Y, Li Y, Liu S, Huang Q, Fan Z, Chen J. Large AI Models for Building Material Counting Task: A Comparative Study. Buildings. 2025; 15(16):2900. https://doi.org/10.3390/buildings15162900

Chicago/Turabian Style

Chen, Yutao, Yang Li, Siyuan Liu, Qian Huang, Zekai Fan, and Jun Chen. 2025. "Large AI Models for Building Material Counting Task: A Comparative Study" Buildings 15, no. 16: 2900. https://doi.org/10.3390/buildings15162900

APA Style

Chen, Y., Li, Y., Liu, S., Huang, Q., Fan, Z., & Chen, J. (2025). Large AI Models for Building Material Counting Task: A Comparative Study. Buildings, 15(16), 2900. https://doi.org/10.3390/buildings15162900

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop