3.2. Phase I: Automated Quantity Takeoff from the BIM Model
Phase I defines the quantity and specification base used by the subsequent matching and costing phases. For each estimating-relevant element, the workflow records the element class, source identifier, quantity basis, quantity value, quantity source, and available descriptive cues such as type name, material, finish, fire rating, and selected property-set values. Native model quantities are used when available, whereas geometry- or metadata-derived quantities are used when native quantities are absent or insufficient. The quantity basis is assigned according to the construction meaning of the element, with discrete objects measured by count, planar systems measured by area, and linear elements measured by length.
3.3. Phase II: Multimodal Description Standardization for Estimator-Readable Matching
Phase II addresses the descriptive gap between BIM model-derived component information and the construction-and-pricing terminology used in structured unit-price cost databases. Although BIM model records often contain technically meaningful information, their labels are rarely written in a form that directly supports reliable cost item retrieval. In practice, model-side descriptions may include abbreviations, exporter-specific naming conventions, fragmented material expressions, or incomplete wording, whereas cost database entries are organized around construction descriptions, measurement units, assemblies, finishes, and unit price assumptions.
To reduce this gap, the workflow applies multimodal description standardization before retrieval. In this study, the standardization component uses an instruction-tuned language model for BIM model text fields and a vision language model. Its role is limited to improving retrieval compatibility by converting BIM model-derived descriptive fields and visual assembly cues into concise, estimator-readable descriptions.
To demonstrate the LLM standardization step, an interior partition-wall record is used as an example.
Figure 2,
Figure 3 and
Figure 4 present the standardization prompt, the structured YAML input from the BIM model, and the resulting JSON output. The input is supplied as structured YAML so that the model receives explicit fields, including wall type, quantity basis, material cues, nominal thickness, and fire rating, rather than an unconstrained paragraph.
The LLM output is constrained to JSON so that downstream retrieval receives a predictable field structure.
The visual example uses a window object as a separate Phase II demonstration. The window image in
Figure 5 was used as supplementary visual input together with the Phase I text record. The purpose of the VLM check (VLM prompt shown in
Figure 6) was to identify visible component cues that could support later retrieval, such as framed opening condition, panel arrangement, glazing presence, and whether the object appears to be a window rather than a wall panel or opening void. The VLM output was therefore treated as supplementary descriptive input and was not used to generate costs, assign quantities, or override the cost database.
The summarized VLM output identified the object as a framed window-like component with a rectangular frame, visible glazing or panel surface, and a vertical mullion or meeting rail. For cost matching, these cues support retrieval toward window records rather than generic openings, wall panels, or other envelope elements.
At the end of Phase II, the LLM output and VLM output are merged only at the same-element level. The LLM-derived standardized description remains the primary retrieval query, while BIM quantity basis, material fields, thickness, finish, and performance-related attributes are copied from the structured Phase I record. VLM-derived cues are appended as auxiliary retrieval terms only when they are visually supported, and VLM limitations are retained as restrictions on information that cannot be inferred from the image. The merged Phase II record is then passed to Phase III, where the standardized description supports dense retrieval and the structured fields support rule-based unit, material, thickness, finish, assembly, and fire rating filters.
3.4. Phase III: Retrieval-Guided Cost Matching and Record Selection
Phase III is implemented as a constraint-guided record selection process that maps each standardized BIM model record either to one cost database item or to a conservative abstention outcome. Let i index BIM model-derived query records and let j index cost database entries. For query i, the Phase II output is a standardized descriptor , and each database entry is represented by a retrieval text . Dense retrieval supplies candidate recall, while compatibility constraints, lexical refinement, shortlist retention, and bounded reranking preserve pricing validity.
Let
denote the sentence embedding function. The semantic similarity between query
and database entry
is computed by cosine similarity,
and dense retrieval returns the top-
K candidate pool
The set is the retrieval–recall pool rather than a final decision set.
Each candidate
is then screened by compatibility constraints. Let
denote hard unit-basis compatibility, where
only when the query quantity basis is dimensionally compatible with the database entry unit. Let
denote the non-unit hard-feasibility indicator, where
only when no rule-based exclusion is triggered by available class anchors, clearly contradictory dimensions, or contradictory specification-bearing cues such as material, finish, support context, and performance attributes. The feasible candidate set is therefore
For each feasible candidate, lexical refinement and rule-based information are combined in a unified score. The ranking indicators below are soft preferences applied only after hard feasibility screening. Let
denote agreement indicators for category, exact unit-form agreement within the already unit-compatible set, primary material, secondary material, finish, and performance-related cues, respectively. Let
be the normalized lexical similarity between the Phase II descriptor and the database search text, and let
be the normalized lexical similarity between the source model name and the database wording; both lexical terms are implemented using RapidFuzz sequence similarity. Let
denote the graded thickness bonus,
where
is the thickness difference between query-side and entry-side information when both are available. The unified rule-based score is then
for all
. The additive score
is an implementation-level heuristic ranking score rather than a probability or calibrated utility. Because the weighted terms and thickness bonus are summed directly,
is not constrained to
; the cutoffs below therefore act as case study decision thresholds rather than confidence levels. The weights
, thickness bonuses
, and thickness breakpoints
are preset implementation parameters for the present case study.
Only candidates with sufficient rule-based support are retained. Let
denote the minimum retention threshold and let
M denote the shortlist cap. The retained shortlist is
Equation (
6) matches the implementation rule that only candidates scoring at least
are retained and that at most
M explicit cost database entries are passed forward to the decision phase.
The final entry decision is made by either the rule-based phase or bounded reranking. Let
denote the strongest surviving rule-based score when
. When
,
is left undefined and the decision rule defaults directly to abstention. Let
denote the ordinal preference score induced by selection-only LLM reranking over the explicit shortlist candidates only, with larger values preferred. The selection rule is
where ∅ denotes abstention. The LLM is therefore invoked only when the strongest surviving rule-based candidate remains below
.
The fallback policy preserves the same conservative logic. If bounded reranking does not return a valid shortlist selection, the workflow falls back to the strongest surviving rule-based candidate when
and otherwise abstains:
In all non-abstaining cases, the final unit rate is read directly from the selected cost database entry ; no unit rate is generated by the language model. Phase III is therefore a constraint-guided item-selection process in which retrieval supplies recall, rule-based filters enforce feasibility, bounded reranking resolves residual ambiguity, and abstention is preferred over unsupported pricing.
Figure 7 provides a step-by-step summary of this selection logic. It is included to make the operational sequence explicit while preserving the mathematical definitions in Equations (
1)–(
9).
Figure 8 illustrates the same Phase III sequence from dense semantic search through rule-based filtering to the final accepted cost database item. It should be read as a compact visual counterpart to Equations (
1)–(
9) and
Figure 7. The figure does not add new algorithmic logic; it summarizes the same sequence of candidate recall, rule-based screening, weighted scoring, shortlist retention, bounded reranking, and final item acceptance from the cost database.
3.5. Phase IV: Database-Based Cost Synthesis and Reporting
Phase IV converts accepted quantity-by-rate pairs into component-level and project-level cost outputs. For each matched item, the extracted quantity is multiplied by the selected database unit rate:
where
is the extracted quantity and
is the unit price read from the selected cost record. The project total is then the sum of all accepted line-item costs:
This phase keeps the final estimate tied to measurable quantities and explicit cost database records. The reporting output retains the model element class, source identifier, quantity source, standardized description, matched cost record reference, unit basis, selected unit price, and final cost contribution.