Next Article in Journal
Algorithm for Scaling Variables in Minimization Methods
Previous Article in Journal
Data-Driven Planning for Casualty Evacuation and Treatment in Sustainable Humanitarian Logistics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Template-Based Approach for Industrial Title Block Compliance Check

1
Digital Excellence Center, Assystem, 92400 Courbevoie, France
2
Technology & Innovation, Assystem, 92400 Courbevoie, France
*
Author to whom correspondence should be addressed.
Algorithms 2026, 19(2), 105; https://doi.org/10.3390/a19020105
Submission received: 23 December 2025 / Revised: 20 January 2026 / Accepted: 23 January 2026 / Published: 29 January 2026

Abstract

Title block compliance checking requires interpreting irregular tabular layouts and reporting structural inconsistencies, not only extracting metadata. This paper introduces a user-in-the-loop, template-based method that leverages a graphical annotation workflow to encode title block structure as a hierarchical annotation graph combining detected primitives (cells/text) with user-defined semantic entities (key–value pairs, tables, headers). The resulting template is matched onto target title blocks using relative positional constraints and category-specific rules that distinguish acceptable variability from non-compliance (e.g., variable-size tables versus missing fields). The system outputs extracted key–value information and localized warning logs for end-user correction. On a real industrial example from the nuclear domain, the approach achieves 98–99% compliant annotation matching and 84% accuracy in flagging structural/content deviations, while remaining tolerant to moderate layout changes. Limitations and extensions are discussed, including support for additional fields, improved key similarity metrics, operational deployment with integrated feedback and broader benchmarking.

1. Introduction

1.1. Application Context

In engineering practice, technical documentation must undergo rigorous compliance checks to ensure reliability and consistency. Conventional verification procedures typically follow a multi-step workflow that mobilizes significant amounts of highly qualified personnel. Although indispensable, these activities consist largely of repetitive and time-consuming tasks that provide limited added value relative to the level of expertise required and remain susceptible to human error. Automated document validation offers a means to address these limitations. When integrated within existing control workflows, it can reduce undetected errors, enhance overall documentation quality, and lower the number of person-hours devoted to routine verification, thereby enabling experts to concentrate on tasks of higher technical complexity. In addition, early and systematic validation can help limit the risk of costly rework, project delays, and budget overruns. Consequently, the automation of document verification supports more efficient resource allocation and contributes to the robustness of engineering project outcomes.
This paper is part of a project to address the automation of technical document validation across diverse engineering disciplines and document formats, with the objective of developing a robust and efficient large-scale compliance pipeline. A key requirement of this pipeline is its adaptability to domain-specific or project-specific rules, which must be defined and managed by end users. Consequently, the understandability and customizability of the system represent central design objectives. This software integrates user-oriented functionalities, including visual feedback and correction suggestions, either as annotations directly overlaid on the document or as margin comments. Additionally, the system generates a separate quality check report, thereby combining automated compliance assessment with user guidance to support iterative document refinement.

1.2. Title Block Compliance Check

A major aspect of technical document verification concerns the title block. A title block is an irregular table, generally located on the first page of a text document or in the corner of an industrial drawing, which functions both as a summary of the document’s metadata and as its formal identifier. It commonly includes the document revision history, its identification information and the applicability context for the remainder of the document, see Figure 1. Ensuring the accuracy of the title block is critical, as inconsistencies or irregularities transmitted to a client or integrated into internal tracking and approval process may constitute a contractual non-conformance, potentially triggering internal investigations that are costly in terms of time and resources. This underscores the need for automating title block verification to reduce the workload on engineers while proactively minimizing errors in client deliverables. In industrial contexts, this need is further amplified by the heterogeneity of document production environments. Engineering deliverables are often produced by multiple contractors and subcontractors, who may rely on different authoring tools and do not necessarily benefit from dedicated title block validation features embedded in their design software. As a result, compliance cannot be guaranteed at creation time and must instead be assessed a posteriori on the delivered documents, independently of the software originally used to generate them. Moreover, because the title blocks considered in this work typically originate from modern digital workflows (e.g., native CAD exports or high-quality PDFs), image degradation is not the primary difficulty addressed here. Rather, the challenge lies in reliably identifying structural non-compliances (e.g., missing or merged cells, misplaced fields, inconsistent tabular organization) and communicating them in an actionable manner.
Importantly, this challenge is not limited to hierarchical or nested information extraction: effective compliance checking also requires explicit detection of violations of structural consistency constraints and the production of interpretable, localized feedback that can be used within existing workflows. Therefore, we propose a title block quality control methodology comprised of three main steps. The information extraction step consists of reading the contents of the title block in a structured format, typically as a set of key–value pairs. The compliance and consistency verification step ensures that the structure and information contained in the title block comply with a predefined set of rules, often specific to a given client or project. Finally, the feedback provision step communicates the results of the compliance verification to the user, including warning messages and corrective suggestions for each identified non-compliance.
To the best of our knowledge, no prior work in the literature has formally addressed the problem of automated title block compliance checking fully. In the following discussion, we present existing approaches in the literature and relevant off-the-shelf tools for title block processing and their shortcomings compared to the title block quality control steps stated above.

2. Related Work and Existing Solutions for Title Block Compliance Checking

Automated compliance checking of engineering documentation combines document understanding—to extract structured information from heterogeneous layouts—with explicit rule evaluation and feedback provision. In the specific case of title blocks (also referred to as title boxes), the task goes beyond extracting a few metadata fields: it requires interpreting an irregular, often highly customized table, detecting both content-wise and structural non-compliances, and reporting them in a form that supports existing control workflows.
This section reviews academic contributions on title block processing [2,3,4,5,6,7,8], together with commercial solutions for title block extraction [9,10,11] and general document understanding systems. We cover cloud-based platforms with customizable extraction models [12,13,14], as well as local building blocks and on-premise document AI approaches that can be integrated into industrial pipelines [15,16,17,18,19,20,21,22]. Throughout the review, we assess each approach with respect to the three stages of a compliance pipeline: structured extraction, rule-based verification, and feedback provision to the user.

2.1. Scope and Evaluation Criteria: From Extraction to Compliance

Most work on title block processing targets information extraction as an end goal, for applications such as drawing retrieval, indexing, or downstream analytics [2,3,4,5,6,7,8]. In contrast, title block compliance checking requires additional capabilities. First, the extracted representation must preserve enough layout information to support structural constraints (e.g., missing or merged cells, misplaced fields, unexpected table topology). Second, the system must explicitly identify violations rather than merely producing a “best-effort” extraction. Third, results must be communicated through interpretable and actionable feedback (e.g., localized warnings and correction suggestions) to support iterative refinement.
Accordingly, we evaluate existing approaches based on (i) how structure-aware their outputs are (key–value pairs versus grid- or entity-based layouts), (ii) their ability to distinguish acceptable variability from non-compliances, and (iii) whether they provide workflow-oriented feedback mechanisms.

2.2. Compliance Requirements and Standards

Title block compliance targets are typically defined through a combination of formal standards and project- or company-specific rules. Some title block standards exist, such as ISO 7200:2004 [23], which specify expected fields, general layout, and representation guidelines. While such standards provide prescriptions, they also leave some freedom in implementation, and adherence is variable in practice due to legacy conventions, domain constraints, and contractor-specific templates [3]. In addition, organizations frequently define internal rules (naming conventions, controlled vocabularies, cross-document consistency requirements) that complement or override standard recommendations.
From an operational perspective, compliance checks can be grouped into four families:
  • Presence and format compliance: required fields are present, non-empty, and follow standard patterns (e.g., drawing number syntax, date formats).
  • Standard-conformance compliance: structure and content conform to standards such as ISO 7200:2004 and to company variants (e.g., sheet sizes, border positions, title block position).
  • Semantic compliance: values belong to controlled vocabularies (units, material codes, project codes). In practice, this is often approximated by dictionaries or pattern-based post-processing of Optical Character Recognition (OCR) outputs [4,9].
  • Cross-document compliance: title block values are consistent with other project artifacts such as bills of materials (BOM), checklists, and revision histories. Industrial case studies mention cross-checking title block information against BOMs and tag lists [4,8].

2.3. Title Block Information Extraction as a Pipeline

Title block information extraction can be decomposed into three subtasks: the title block localization task, the structural modeling task, and the text recognition and semantic labelling task. This decomposition is particularly relevant for compliance, as structural constraints and error explanations depend on the availability and fidelity of intermediate representations.

2.3.1. Localization of the Title Block Region

The localization task identifies the region containing the title block on the drawing sheet. Typical methods include (i) rule-based geometric heuristics that assume bottom-right position, margins, or standard sheet layouts [7], (ii) morphological and structural analysis of connected components and frames [3], and (iii) CNN-based object detection models [5,6] such as YOLO [24]. In modern digital workflows, where drawings are often exported as vector PDFs or high-quality raster images, localization is typically robust. However, localization errors propagate to downstream structure inference and may be mistaken for actual structural non-compliances.

2.3.2. Structural Modeling and Layout Reconstruction

The structural modeling task aims to represent the internal structure of the title block (cells, fields, labels) in a layout-independent way. Reliably interpreting a title block layout is challenging because title blocks often exhibit highly irregular structures. In practice, a title block may combine nested generic entities in varying configurations. These can include form and comb fields containing textual entries; selection fields, typically represented by checkboxes storing boolean values; and tabular fields corresponding to regular tables with column or row headers. Capturing such heterogeneity is essential to enable structural constraints and to produce meaningful feedback when non-compliances occur. Template-based approaches commonly represent title blocks as grids, model line topology using intersection-point matrices or graphs, and then perform (possibly fuzzy) matching against reference templates to handle multi-style and multi-template cases [4]. Earlier systems also rely on structural cues extracted from line detections and connected components to reconstruct a plausible table layout before field assignment [3].

2.3.3. Text Recognition and Semantic Field Labeling

The text recognition and semantic labelling task aims to recognize the textual content and map it to semantic fields (drawing number, revision, material, etc.). The typical output of this task is a set of key–value pairs. Field assignment can be achieved through geometric and format-based rules (e.g., “cell at row X, column Y is the drawing identifier”) in engineering drawing pipelines [8]. More recent approaches combine OCR outputs with learned models to infer field labels, including deep learning pipelines for title block processing [2,5]. Vision–language models can also be used for post-processing, for instance by transforming extracted text and positional cues into structured JSON metadata [5], or by improving robustness to recognition noise through transformer-based text recognition modules [2]. From a compliance perspective, semantic labelling is necessary but insufficient if the extracted representation discards the spatial organization that supports structural rules.

2.4. Title Block Structural and Content-Wise Rule Checking

While information extraction has been widely studied, explicit rule checking is less frequently formalized as a standalone research objective. In practice, compliance is often implemented as a downstream layer that consumes extracted fields and evaluates them against organization-specific rules. Industrial systems may combine learned detectors and OCR with such rule layers, for quality assurance or production-oriented checks, even when the underlying rule sets are not publicly disclosed [9]. Academic and industrial pipelines also report content-wise validation steps such as dictionary-based normalization, pattern enforcement, and cross-checks against external sources (e.g., BOMs or tag lists) [4,8].
However, two limitations are recurrent. First, many systems assume that structural variability is “handled” by robust extraction, and therefore do not expose structural anomalies (e.g., missing cells) as explicit violations. Second, rule checking components are often designed for a narrow schema (fixed field sets and positions) and are difficult to adapt to contractor-specific title blocks without substantial engineering effort. These observations motivate approaches that keep an explicit representation of the title block structure and that treat both structural and content-wise constraints as first-class compliance objectives.

2.5. Feedback Provision and Workflow Integration

Feedback and workflow integration determine whether automated checks can be adopted in engineering practice. Existing systems commonly connect title block extraction to document search or indexing, or to downstream inspection and QA pipelines [2,5]. Commercial tools typically provide interfaces to visualize extracted entities or to define extraction templates, supporting integration with Computer Assisted Design (CAD) or Building Information Modeling (BIM) or document management ecosystems [9,10,11].
Some off-the-shelf solutions incorporate a graphical user interface, either to allow users to define a title block template in template-based approaches [10,11], or to visualize detected entities in data-driven approaches [9,12,13], as presented in Figure 2. While these interfaces facilitate visualization and configuration, they typically provide limited support for expressing and diagnosing structural violations in heterogeneous title blocks. In addition, the nested and irregular structure of title block entities is not always explicitly represented, which can hinder user interpretation in the presence of extraction errors. In contrast, compliance-oriented systems must provide localized and interpretable feedback that highlights both structural and content-wise non-conformances and supports correction.

2.6. Commercial and Cloud Document AI Solutions

Commercial solutions address title block extraction either as a domain-specific feature integrated into engineering software ecosystems, or as an instance of general document understanding. The latter category has recently been reshaped by generative AI-based services capable of extracting structured information from a wide variety of document types.

2.6.1. Title-Block-Oriented Products and CAD/BIM Tools

Commercial solutions for title block extraction considered here include Werk24 (V2.3.0) [9], Kahua [10] (V2025.5), and AutoCAD (V25.1) [11]. Kahua and AutoCAD integrate title block processing into broader software environments (e.g., BIM/CAD or document management workflows), typically exposing configuration interfaces to define the expected fields and their locations. Werk24 provides extraction as an independent service and focuses on mechanical engineering drawings, including domain-specific metadata. Such products are effective within their target contexts, but their assumptions and field schemas may not transfer to heterogeneous contractor deliverables or to title blocks that deviate from the supported domain conventions.

2.6.2. Generic Document Understanding Services with Customization

The extraction of information from title blocks can be regarded as a specific instance of document layout understanding, which involves identifying and structuring entities within documents ranging from highly structured to unstructured formats. Cloud-based services such as Microsoft Azure Document Intelligence, Google Cloud Document AI, and Amazon Bedrock Data Automation provide solutions capable of performing layout-aware extraction across many standard document types. Among these, the respective custom models—Azure’s Custom Template [12], Google’s Custom Extractor [13], and Amazon’s Custom Blueprint mechanisms [14]—are of particular interest for title blocks, as they combine pre-trained models with configurable schemas to adapt to domain-specific layouts. Model adaptation is typically supported through annotated training examples, with Azure and Google providing graphical annotation interfaces, while Amazon adopts an instruction-based mechanism for specifying fields to extract.
Although these platforms can deliver strong key–value extraction performance, their primary objective is information extraction rather than compliance. As a result, structural anomalies may be absorbed by the model’s robustness and not surfaced as explicit violations. When layout outputs (tables or regions) are available, they can serve as a basis for structural checks, but additional steps are required to link keys and values and to enforce domain-specific and project-specific constraints.

2.7. Local Deep Learning Document Understanding Solutions

Beyond cloud services, a growing ecosystem of local document AI toolkits and models can be deployed on-premise and integrated into industrial pipelines. These approaches are typically designed for generic document image analysis (forms, invoices, receipts, scientific articles) rather than engineering drawings, but they provide reusable components for layout parsing and information extraction.
LayoutParser [15] is a unified toolkit for document layout analysis that facilitates the construction of pipelines combining layout detection, OCR, and structured representations. It provides abstractions to integrate state-of-the-art detectors and can therefore be used as an implementation backbone for title block localization and layout reconstruction. LayoutLMv3 [16] is a multimodal transformer pre-trained with unified text and image masking, targeting downstream tasks such as document classification and information extraction. In such models, OCR tokens and their bounding boxes are central inputs, which makes them suitable when reliable OCR and spatial annotations are available. Finally, Donut [17] proposes an OCR-free end-to-end transformer that directly generates structured outputs from document images. While attractive for simplified pipelines, OCR-free generation can reduce interpretability and control, which is a concern for compliance scenarios where explicit localization of violations is required.

2.8. OCR Pipelines for Raster/Scanned Drawings Solutions

When engineering documents are available as scans or rasterized PDFs, OCR becomes a key dependency of the extraction pipeline. Local OCR engines and libraries provide practical options for on-premise deployment. Tesseract [18] is a long-standing open-source OCR engine widely used as a baseline. More recent deep-learning-based toolkits, such as PaddleOCR [19], docTR [20], and EasyOCR [21], offer modular pipelines that combine text detection and text recognition models and can be adapted to domain-specific fonts and rendering conditions. In addition, general-purpose vision frameworks such as Detectron2 [22] can be used to train custom detectors or segmenters to localize title blocks or specific graphical elements (e.g., revision tables) prior to OCR.
From a compliance perspective, OCR robustness reduces the risk of content-wise false alarms, but it does not address structural violations unless the downstream representation preserves layout and supports explicit constraint checking.

2.9. Method Families and Their Implications for Compliance

Across both academic and commercial solutions, two methodological families are commonly encountered: non-data-driven approaches based on rules and templates, and data-driven approaches that infer structure from visual and textual content. These families offer different trade-offs in terms of interpretability, robustness, and suitability for structural compliance checking.

2.9.1. Non-Data-Driven Approaches: Heuristics and Template Matching

Non-data-driven approaches include heuristic methods and template-based methods. Heuristic methods define rules based on expected border positions and margins of title blocks, text density, and line detection cues [3,10,11]. They are typically interpretable and can provide explicit error conditions (e.g., missing borders or unexpected geometry), which is useful for compliance. However, they often assume high-quality “pixel perfect” documents and can be fragile when templates vary significantly. Template-based methods represent title blocks as grids or tables and perform matching against a library of reference templates, possibly with fuzzy matching to accommodate geometric variations [4]. When template libraries capture expected variants, such methods can naturally preserve structure and therefore support structural rule checking; the main limitation lies in maintaining comprehensive template coverage across contractors and projects.

2.9.2. Data-Driven Approaches: CNN, Transformers and VLM/LLM-Based Systems

Data-driven approaches infer title block structure using learned models. Deep learning computer vision pipelines typically rely on CNN-based detectors (and more recently vision transformers, or ViTs) for localization and on OCR models for text extraction, sometimes combined with specialized segmentation components [2,5,6,15,16,17,22]. More recently, multimodal systems combine visual features with large vision–language models (e.g., GPT-4o [25], Qwen2-VL [26]) to jointly reason about layout and content, producing structured metadata directly [2,5]. Cloud document understanding platforms also fall into this category, as they use pre-trained models (and sometimes agentic orchestration) to extract key–value pairs and tables from documents [12,13,14].
Data-driven solutions are generally robust to layout distortions and image noise, in contrast to domain-specific software that often assumes stricter input quality conditions. This robustness also explains their widespread adoption for legacy document digitization, where degradations and acquisition artifacts are common. However, it can come at the cost of compliance sensitivity: layout anomalies such as missing cells or unintended merges may be absorbed as normal variability rather than being identified as explicit violations. In compliance-oriented settings, this motivates the use of structure-preserving outputs and, more broadly, hybrid architectures that complement data-driven extraction with explicit structural constraint checking.

2.9.3. Illustrative Case Study: Key–Value Pair Extraction vs Layout Preservation

To illustrate how extraction outputs impact compliance, Figure 3 reports a simple experiment conducted on two title blocks (Figure 3a,b) that share the same information but exhibit clearly different structures. We used the Azure Document Intelligence Studio (V4.0) General Document model [27], whose application scope is sufficiently close to our use-case for this demonstration. When prompted to extract title-block key–value pairs, the model returns the same four key–value pairs for both inputs (Figure 3b,e), despite the structural discrepancy between the two layouts. In this setting, the output is unsuitable as a first stage of a pipeline aiming at downstream structural violation detection, because the spatial organization of the elements is not preserved. By contrast, when prompted to output the document layout, the model produces tabular structures (Figure 3c,f) in which positional information is retained. This output can serve as a basis for structural compliance checking, provided that an additional step is introduced to reliably associate each key with its corresponding value.

2.10. Synthesis, Research Gap, and Contributions

Overall, the literature and available tools indicate that title block localization and field extraction can be addressed effectively in many contexts, either through template-based reasoning [4] or data-driven pipelines [2,5]. In parallel, commercial products and cloud platforms provide accessible extraction capabilities and user-facing interfaces for template configuration or visualization [9,10,12]. Nevertheless, formal, structure-aware compliance checking with interpretable feedback is not treated as a first-class problem. Compliance is typically implemented as an implicit downstream layer on top of extraction, often tailored to a fixed schema and a narrow application domain [8,9]. In particular, structural consistency constraints (e.g., missing or merged cells) are rarely surfaced explicitly by robust data-driven extractors. These limitations motivate approaches that preserve structure, support user-defined rule sets, and provide actionable feedback aligned with engineering control workflows. To address this gap, we propose a rule-driven, template-based approach that (i) extracts title block information while preserving its structural organization, (ii) verifies both structural and content-wise constraints against user-defined rules, and (iii) provides explicit and localized feedback with correction guidance in a human-in-the-loop setting.

3. Materials and Methods

3.1. Problem Statement

The objective of our strategy is to provide document authors with a turnkey tool for the automatic quality control of title blocks, while maintaining adaptability to new title block formats. It must also be operable by end users, such as engineers, without the intervention of specialized personnel. Given the inherent variability and ambiguity of title block layouts (which may differ across users, projects, and organizations), the proposed approach seeks to incorporate the user’s domain knowledge directly into the automated extraction pipeline. The method relies on a template-based extraction procedure that uses a graphical interface to collect user input on a given source title block and provide real-time visual feedback on detected inconsistencies or errors on a target title block. Each template consists of a set of annotations (bounding boxes) associated with predefined annotation categories representing common generic entities. The system is designed to handle a representative range of entity types and to remain robust to typical variations and distortions in title block layouts.

3.1.1. Considered Generic Entities

A wide range of generic entities can typically be observed in industrial title blocks (presented in Figure 4). For the purposes of this study, the following categories of generic entities are considered, based on academic literature [28] and existing software [9,10,12,13,14]:
  • Form fields represent the most common type of generic entity. They consist of one or more cells, with at least one cell containing a text block that serves as the key of the key–value pair. The corresponding value may span multiple cells, as is the case with comb fields.
  • Selection fields comprise a checkbox, which encodes a Boolean value, and a text block that defines the associated key. The value of a key–value pair can be inferred either from a single selection field (checkbox field) or from a set of mutually exclusive selection fields (radio button field).
  • Tabular fields, such as revision history tables or applicability tables, are regular tables that include at least one column or row header representing the key of the contained values.
Additionally, some key–value pairs possess an explicit key, meaning that the key can be directly identified from the title block (e.g., the form field presented in Figure 4a). This property allows their position within the target title block to be inferred relative to the position of the key during the template matching phase. Conversely, other key–value pairs lack identifiable text blocks; their correspondence in the target title block must therefore be inferred from their relative position with respect to neighboring key–value pairs. Such is the case of the comb field in Figure 4d.

3.1.2. Considered Title Block Variability and Non-Compliances

A set of common title block distortions has been defined based on those typically handled by title block extraction software. The considered distortions include:
  • The size and position of title block cells may vary depending on project-specific requirements or the software used to generate the document.
  • Although the position or presence of individual cells may differ, the relative spatial relationship between key and value cells must remain consistent between the template and the target.
  • Additional cells may appear in the target title block compared to the template, as in the case of revision history tables, where the number of rows depends on the number of signatories.
  • Textual keys in the target title block may contain typographical errors or minor variations.
However, as discussed in [4], such distortions do not necessarily correspond to non-compliances, as they may simply reflect acceptable variability of an underlying title block structure. Consequently, our method must be able to robustly extract fields in a multi-style setting while also detecting and flagging genuine non-compliances. In addition, the method should allow users to specify which types of variations in the title block are considered acceptable.

3.2. Theoretical Background

A title block is represented as a set of nested entities, which may include both alphanumeric text elements and geometric shapes. For instance, a selection field comprises a text label specifying the field name and an associated checkbox indicating the selection state. Each entity is encoded as an annotation that includes its spatial position (represented by a bounding box) and its associated metadata. The overall layout of a title block is thus expressed as a hierarchical structure of nested annotations, referred to as the annotation graph, see Section 3.2.2. The extraction process is formulated as an assignment problem between entities in the source and target title blocks, based on their relative spatial relationships within their respective annotation graphs. The following section provides a detailed description of the annotation types considered, the construction of the annotation graph, and the entity matching procedure.

3.2.1. Annotation Categories

Annotations are categorized into two main groups. The first group, referred to as detected annotations, corresponds to the physical structural elements of the title block that can be directly extracted from the document image. These include entities such as cells, subcells, and text blocks. The second group, referred to as manual annotations, represents the semantic structural elements of the title block and must be defined by the user. For example, while text blocks contained within a cell can be automatically extracted from an image, distinguishing which text block represents the key of the cell and which represents its value requires user input. Manual annotations therefore include higher-level constructs such as regular tables or cell keys.
A comprehensive list of all annotation categories in addition of a detailed description of each annotation category is presented in Appendix A. The possible associations between annotation categories form a directed graph, referred to as the annotation categories graph, is illustrated in Figure 5. Each annotation is provided a category among those present in this graph, from a Root annotation, depicting the entirety of the title block, and organized in a hierarchy of KeyValuePairs, Cells, RegularTables, down to individual TextBlock and Cross annotations.

3.2.2. Annotations Graph Construction

An annotation graph is a directed graph that represents the nested structure of a title block’s annotations. It is constructed from a set of annotations through a series of assignment steps. Annotations belonging to two related categories, as defined in the annotation categories graph (see Section 3.2.1), form a bipartite graph whose edge weights represent distances between annotations, computed as the intersection over union (IoU) of their bounding boxes. Each annotation from a child category is assigned to its most probable parent annotation in order to obtain a maximum-weight matching across all annotations of the child category. The assignment process is non-bijective, allowing multiple child annotations to be linked to a single parent annotation. Assignments are performed recursively following a breadth-first backward traversal of the annotation categories graph.
To establish correspondences between detected annotations in the source and target title blocks, relative positional information is computed. During traversal of the annotation graph, each annotation (and its associated children) is assigned a recursive identifier and a sibling-relative index that captures its local ordering. Since sibling annotations typically form an irregular grid, we canonicalize their bounding boxes by extending them to define a regular tabular structure. Each annotation is then assigned an identifier based on its span within this canonical grid. This process is applied recursively to generate a unique hierarchical identifier for every annotation in the graph. An example of an annotation graph derived from a simple title block is shown in Figure 6b, together with the corresponding manual and detected annotations overlaid on an example form field in Figure 6a.

3.2.3. Source Title Block to Target Title Block Matching

Applying this template to a target title block consists of identifying, within the target, the instances corresponding to each of the template’s manual annotations, see Figure 7a.
First, the target detected annotation graph (containing only its Cells and TextBlocks) is constructed to provide an overview of its structure, see Figure 7b. Then, the target title block Cells are partitioned into KeyValuePairs by matching each source KeyValuePair to a set of target Cells. For each source KeyValuePair, the Cells whose content is closest to its key, according to the longest common substring metric, are selected together with their neighbors that share the same relative positions as the children Cells of thesource KeyValuePair. This assignment strategy is bijective, under the assumption thateach KeyValuePair key appears at most once among the target Cells. The target KeyValuePair is then defined as the envelope, or the smallest enclosing rectangular hull, of the matched target Cells.
Once this partitioning is completed, each child annotation of a source KeyValuePair is matched to its most probable counterpart in the target title block using Algorithm 1. This algorithm aggregates new target annotations by recursively applying the matching procedure described in Algorithm 2 to all manual annotations present in the source title block, following a top-down traversal from the root to the leaf nodes of the annotation graph (see Figure 7c). The + symbol indicates an insertion into the collection in both of these algorithms.
Algorithm 1 Build New Annotations Algorithm
1:
function BuildNewAnnotations( t e m p l a t e _ K V P s , t a r g e t _ c e l l s )
2:
       n e w _ a n n s
3:
      for all  ( t e m p l a t e _ K V P , m a t c h e d _ t a r g e t _ c e l l s )
                                                                            zip ( t e m p l a t e _ K V P s , t a r g e t _ c e l l s )  do
4:
            n e w _ a n n s n e w _ a n n s + M A T C H S O U R C E A N N I N T A R G E T ( t e m p l a t e _ K V P ,
5:
                                                                                                       m a t c h e d _ s o u r c e _ c e l l s )
6:
      end for
7:
      return  n e w _ a n n s
8:
end function
Figure 7. Examples of annotations graphs encoding a title block containing a form field and an applicability table (regular table). (a) Source title block annotations graph. (b) Target title block detected annotations graph. (c) Target title block detected and manual annotations graph.
Figure 7. Examples of annotations graphs encoding a title block containing a form field and an applicability table (regular table). (a) Source title block annotations graph. (b) Target title block detected annotations graph. (c) Target title block detected and manual annotations graph.
Algorithms 19 00105 g007
Algorithm 2 Detailed Matching Algorithm
1:
function MatchSourceAnnInTarget( s o u r c e _ a n n , t a r g e t _ a n n s )
2:
       n e w _ a n n s
3:
       m a t c h e d _ t a r g e t _ a n n s , r e m a i n i n g _ t a r g e t _ a n n s F I L T E R T A R G E T A N N S ( t a r g e t _ a n n s )
4:
      if  l e n ( m a t c h e d _ t a r g e t _ a n n s ) = 0  then
5:
             S e n d w a r n i n g m e s s a g e t o u s e r
6:
      else
7:
             n e w _ a n n s E N V E L O P P E ( m a t c h e d _ t a r g e t _ a n n s )
8:
            for all  c h i l d _ a n n s o u r c e _ a n n . children  do
9:
                   n e w _ a n n s n e w _ a n n s + M A T C H S O U R C E A N N I N T A R G E T ( c h i l d _ a n n ,
10:
                                                                                                   r e m a i n i n g _ t a r g e t _ a n n s )
11:
          end for
12:
    end if
13:
    return  n e w _ a n n s
14:
end function
Each new manual target annotation is defined as the envelope of the detected annotations, selected to minimize a category-specific rule-based metric. For example, for a RegularTable with expected header values (use_value_as_key field set to True, as described in Section 3.3.3), Cells that minimize the longest common substring metric with the expected header values are selected. Conversely, for a RegularTable with variable header values (use_value_as_key field set to False), Cells that share the same relative position as the source header are selected. It is at this stage that potential non-compliances of the target title block are identified and corresponding warning messages are generated.
After all manual annotations from the source have been located in the target title block, key–value extraction is performed by traversing the target annotation graph in a breadth-first manner from the leaves to the root. Each annotation category has a dedicated extraction procedure that aggregates information from its child annotations and propagates it to its parent node. This traversal yields the final structured representation of the title block as a set of key–value pairs.

3.2.4. Summary

This method workings can be summarized by the following steps:
1.
Extract the structure of the source title block as an annotation tree containing both detected and manual annotations.
2.
Build the source title block annotation tree (containing both detected and manual annotations) and export it as a template.
3.
Build the target title block annotation tree (containing only detected annotations).
4.
Pass the source manual annotations present in the template to the target title block (matching).
5.
Build the completed target title block annotation tree.
6.
Run through the completed target title block annotation tree to extract its key–value pairs.

3.3. Implementation Details

3.3.1. Overview of the System

Our method comprises two primary components: a graphical interface for defining and visualizing title block entities, and a title block extraction script for identifying and retrieving these entities. A template summarizing the structure and content of a source title block is first created by the end user through the graphical interface. This template can then be applied to other target title blocks that conform to the same general layout. The overall workflow of the proposed approach is illustrated in Figure 8.

3.3.2. Feedback Provision to the User

Several complementary information pipelines are implemented within the proposed system to assist the end user during the annotation and validation processes. The first pipeline involves the visualization of annotations overlaid on the title block image. Our method employs an annotation interface that enables both the collection of user-defined (manual) annotations and the display of automatically inferred (detected) annotations, as shown in Figure 6a. Such interfaces offer useful functionalities, including the ability to toggle the visibility of annotations and to modify annotation metadata directly within the annotation window. For the present study, the COCO Annotator software [29] was used to perform this task. Once the annotation process is completed, the title block template is exported via the COCO Annotator REST API in the COCO annotation format.
The second pipeline provides a set of human-readable warning messages. These messages are designed to inform the user of potential layout inconsistencies or missing information within the title block. They are displayed in the command-line output, allowing users to identify the affected annotations through their identifiers and subsequently inspect them in the graphical interface. Some warning messages examples are shown in Appendix C.

3.3.3. Ease of Use Considerations

Our method is designed to leverage end-user domain knowledge of title block layouts while limiting the required effort and avoiding the need for familiarity with the system’s internal mechanisms. To accommodate variability in title block representations, category-specific metadata are introduced, enabling users to adjust the matching procedure according to their expertise regarding the variability observed in their domain. For example, two metadata fields are defined to address variability in regular tables. As discussed in Section 3.1.1, some regular tables, such as revision history tables, exhibit fixed expected header fields (column names), whereas others allow for variable header fields. Similarly, regular tables can differ in their shape between different title blocks of the same template, such as for applicability tables. To support these cases, two boolean metadata fields are introduced: the keep_same_dimensions field and the use_value_as_key field, which allow the user to indicate acceptable types of variability without exhaustively specifying all possible configurations. In this way, users can inject their expert knowledge about the title block layout while keeping the information specification effort limited.
To further reduce annotation effort, the matching procedure is designed to operate on loosely defined bounding boxes. For this purpose, it employs an IoU metric with a low acceptance threshold, providing robustness to minor positional inaccuracies in the annotations. As illustrated in the next section, the bounding box of an annotation does not need to align precisely with the corresponding entity in the source title block for the template to be correctly interpreted by the model.
Moreover, several non-coding annotation categories are introduced to enable users to correct potential layout inconsistencies during the template annotation phase. In particular, a Line category allows users to add missing cell border lines, and a Whitepatch category enables them to mask or exclude anomalous regions in the source title block.

4. Results

4.1. Experimental Protocol

Two experiments were conducted: a control experiment and a real-case experiment. In both cases, the source and target title blocks share the same underlying template and originate from the nuclear industry. They include all the generic entity types described in Section 3.1.1, as illustrated in Figure 9.
In the control experiment, the target title block maintains the same overall structure as the source but differs in its content. In contrast, the real-case experiment introduces multiple variations in both structure and content relative to the source title block, see Figure 9. These variations include missing annotations in the target title block, the addition of non-coding cells, and typographical errors in key labels. The complete list of applied distortions is provided in Appendix B.

4.2. Evaluation Metrics

In each experiment, the objective is to assess the model’s ability to detect annotations in a target title block based on a predefined template and to verify that any inconsistencies present in the target title block are appropriately identified and clearly reported to the user.
Two evaluation metrics are introduced. The annotation detection accuracy rate measures the model’s ability to correctly identify annotations in the target title block that conform to the template specifications. The alarm accuracy rate evaluates the model’s effectiveness in flagging appropriate warnings for annotations that deviate from the template rules. Accordingly, for each annotation in the source title block, it contributes to the annotation detection accuracy rate if the corresponding annotation is present in the target and conforms to the template; and to the alarm accuracy rate otherwise. The computation of these metrics is detailed in Figure 10, where TN, TP, FN and FP refer, respectively, to True Negatives, True Positives, False Negatives and False Positives as defined by the criteria in Algorithm 3.
Algorithm 3 Confusion matrix criteria
1:
T P , F N , T N , F P 0
2:
for all  t a r g e t a n n o t a t i o n  do
3:
      if  t a r g e t a n n o t a t i o n i s s o u r c e l a y o u t c o m p l i a n t  then
4:
            if  t a r g e t a n n o t a t i o n d e t e c t e d  then  T P T P + 1
5:
            else  F N F N + 1
6:
            end if
7:
       else
8:
            if  w a r n i n g m e s s a g e i s p r o d u c e d  then  T N T N + 1
9:
            else  F P F P + 1
10:
          end if
11:
     end if
12:
end for
The evaluation focuses on annotation detection rather than on extracted key–value pairs, as the number of key–value pairs generated from a single annotation can vary substantially depending on the annotation category. In cases where child annotations are missing from the target title block because their parent annotations are also absent, they are counted within the alarm accuracy rate depending on whether their parent annotation has been flagged. This strategy minimizes redundant warning messages and provides a clearer and more concise feedback to the user.

4.3. Detailed Results

Figure 11 illustrate the annotations overlaid on the source and target title blocks. The results of the control and real-case experiments are presented for each manual annotation category in Table 1 and Table 2. A comprehensive description of the title block errors and the corresponding warning logs is provided in Appendix C. Finally, the sources of model-generated warnings and errors are broken down by annotation category in Table 3.

4.3.1. Control Experiment Results

In the control experiment, the model successfully matched identical fields in the target title block using the template generated from the annotated source title block, achieving an annotation detection accuracy rate of 99 % (84 true positives, i.e., properly matched compliant target annotations, out of 85 true annotations, i.e., compliant target annotations), see Table 1. The only unmatched annotation corresponds to a KeyValuePair representing the document identifier encoded as a comb field. This field does not contain an explicit key and therefore must be matched based on its relative position. A feature that is not yet implemented in the current version of the system. Further details are discussed in Section 5.

4.3.2. Real-Case Experiment Results

In the real-case experiment, the model achieved an annotation detection accuracy rate of 98 % (65 true positives out of 66 true annotations), see Table 2, despite the structural and textual distortions introduced in the target title block shown in Figure 9. As in the control experiment, the only unmatched annotation corresponds to the KeyValuePair representing the comb field. Notably, the model successfully matched tabular fields in the target title block, demonstrating robustness to variations in table dimensions, style, and header content.
The model also exhibited reliable performance in detecting and reporting template layout inconsistencies, achieving an alarm accuracy rate of 84 % (16 true negatives, i.e., properly flagged non-compliant target annotations, out of 19 false annotations, i.e., non-compliant target annotations), see Table 2. As shown in Table 3, all template errors related to missing annotations in the target title block were correctly identified and reported to the user (true negatives). In addition, a positioning error involving a TextBlock within a NamedCheckBox was correctly flagged (specifically the “interne” checkbox within the “Accessibilité XXX” KeyValuePair located at the bottom of the title block).
Most spelling-related discrepancies between keys were also successfully detected, although certain minor variations were missed (false positives in the table). These undetected cases typically involved single-character differences between the source and target, such as “Nom du sous-traitant” versus “Nom du sous traitant” resulting in incomplete matching of the longest common substring used for key comparison.

5. Discussion

5.1. Title Block Extraction Performance

Our method demonstrates the ability to successfully match a wide range of generic entities commonly found in title blocks and to flag non-compliances, provided that a shared template is available. The matching procedure, while still imperfect, has shown some robustness to typical distortions between source and target title blocks while giving the user the ability to control possible variabilities through a graphical interface.
However, two main limitations remain in the current implementation. First, comb fields are not yet supported and require the addition of specific rules to enable accurate matching in the target title block. Second, while the model is able to flag several keys in the presence of minor spelling non-compliances, such discrepancies are not currently consistently reported to the user as warning messages. When the bounding box of a key in the target title block does not fully cover the words containing typographical errors, the matching operation may fail silently and not trigger a warning. This limitation highlights the constraints of using the longest common substring as the sole similarity metric for key matching. Potential improvements include the integration of alternative similarity measures during the envelope construction phase, such as the Levenshtein distance, which accounts for character-level substitutions. Another limitation of the system is that it uses title block cells as anchor points during the matching step, and therefore assumes that the title block is defined as a set of well-formed cells, with no “floating” information. This assumption is consistent with existing standards (ISO 7200:2004 [23]), but implies that the system cannot robustly handle poorly defined cells, such as those bounded by dotted lines. This limitation is, however, shared with other title block extraction software reported in the literature [9].

5.2. User-Facing Visualization and Feedback Quality

Template creation can be performed entirely through the graphical interface, which allows users to define annotation bounding boxes and optional metadata. As illustrated in the overlaid annotation figures, the annotation process is user-friendly, as the model does not require precise bounding box placement for accurate interpretation. The resulting matched annotations can be superimposed on the target title block, providing users with visual feedback on the hierarchical structure of the extracted entities. When combined with the generated warning logs, this visualization framework offers reliable diagnostic support, helping users to identify and understand both content-related errors (e.g., misspellings) and layout-related issues (e.g., missing or misplaced entities) within the title block.

5.3. Further Research Directions

5.3.1. Generalization Capabilities of the Method on a Broader Dataset

The generalization capabilities of the proposed method should be evaluated on a broader and more diverse corpus of title blocks. At present, we are not aware of any public dataset specifically designed for industrial title block structural compliance checking, i.e., providing both title block content and explicit annotations of structural or layout non-compliances. This gap likely reflects the fact that title block structural compliance is rarely formalized as a benchmark task in the document analysis literature.
To support objective and reproducible evaluation, we propose a synthetic benchmarking strategy. The approach consists of starting from an existing public table dataset, such as PubTables-1M [28], or document layout dataset, and then procedurally injecting controlled non-compliances (e.g., missing mandatory fields, swapped blocks, forbidden relocations, invalid nesting, or inconsistent table structures). Such a benchmark would enable systematic stress-testing of the violation detection component and a more rigorous assessment of the relevance and clarity of the feedback generated for the user.

5.3.2. Deployment and User Evaluation in Operational Conditions

Although the proposed strategy shows robustness under the distortions considered in our experiments, both its resilience to real production variability and its usability must still be validated in an operational setting. As a next step, we plan to integrate the method into the production infrastructure and evaluate it under real-world conditions. This deployment will involve developing a dedicated annotation interface to replace COCO Annotator, with the goal of presenting warning messages directly within the graphical environment.
After deployment, we will organize workshops with end users to collect qualitative feedback on two aspects: (i) the accuracy of the extracted key–value pairs across a diverse set of industrial title blocks and (ii) the ease of use of the production graphical interface. To support this evaluation and promote consistent usage, a user guide with illustrative annotation examples is currently being prepared.

5.3.3. Scalable Management of Multiple Templates

Because the proposed framework relies on manually defined templates, its scalability when a large number of templates coexist in an industrial environment warrants further consideration. In our expected use case, title block templates are often client- or subcontractor-dependent; accordingly, templates can first be organized by target client. To select a template within a client-specific subset, we envision combining two complementary procedures.
  • Key-term driven preselection: maintain a lookup table that lists the key terms associated with each template. For a given document, OCR-detected text blocks can be compared against this inventory to propose the most likely template, for instance by selecting the template whose key terms are most consistently present. This step requires an appropriate text similarity measure and can be complemented by the coarse structural description provided by the initial annotation graph.
  • User-assisted disambiguation: if multiple templates remain plausible, the graphical interface can offer a side-by-side visualization to help the user validate the best match by visually comparing candidate templates against the target title block.
In addition, templates may be enriched with metadata such as a standardized template name, a version identifier, and provenance information. The interface could further support a diff-like comparison workflow to distinguish highly similar variants. While variability handling may limit the number of templates required in practice, this assumption should be validated on operational datasets.

5.3.4. Improvement of the Key–Value Pair Export Format

An additional aspect identified for improvement concerns the readability and usability of the exported key–value pair format. Preliminary workshops with end users, aimed at evaluating the quality of the extracted key–value pairs, have led to the definition of several new annotation categories and processing rules designed to enhance this aspect of the system:
  • Introduction of a RadioButton annotation category to represent multiple mutually exclusive Boolean values as a single formatted output.
  • Modification of the SubCell annotation category to allow customization of cell content processing when a cell contains multiple key–value pairs.
  • Addition of a user-selectable export option for regular tables, allowing data to be exported either as individual key–value pairs for each non-null element (suitable for sparsely populated tables such as applicability tables) or as grouped sets of key–value pairs (e.g., the individual lines of history revision tables).

6. Conclusions

This study reframes title block processing as compliance checking, where the objective is to preserve structure, validate layout/content rules, and generate feedback that engineers can act upon. The proposed framework achieves this by combining (i) a lightweight annotation workflow for capturing end-user knowledge of a title block template and (ii) a graph-based representation that supports structure-aware matching and rule evaluation on new documents.
Across two experiments on industrial title blocks from the nuclear domain, the method demonstrates strong robustness to realistic variability. It matches compliant entities with 98–99% detection performance and identifies most injected deviations, reaching an 84% warning (alarm) accuracy under real-case distortions. Importantly, the approach successfully transfers tabular entities under changes in dimensions and headers when such variability is declared acceptable at the template level, and it provides interpretable overlays and warning messages to support diagnosis.
The current system remains limited by incomplete support for certain entity types (notably comb fields) and by key-matching sensitivity when minor spelling differences occur or when bounding boxes do not fully cover the relevant tokens. Future work will focus on (i) extending entity coverage and similarity measures, (ii) integrating warnings directly into a dedicated annotation/review interface, (iii) scaling template selection and versioning in multi-contractor environments, and (iv) developing a broader benchmark—potentially via controlled synthetic distortions—to evaluate structural compliance detection more rigorously.

Author Contributions

Conceptualization, O.L. and R.L.; methodology, O.L.; software, O.L.; validation, Q.R. and R.L.; formal analysis, O.L.; investigation, O.L.; resources, R.L.; data curation, R.L. and O.L.; writing—original draft preparation, O.L.; writing—review and editing, K.N., Q.R. and R.P.; visualization, O.L.; supervision, R.L., H.D. and R.P.; project administration, H.D. and N.B.; funding acquisition, N.B. and R.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available because of confidentiality restrictions.

Acknowledgments

The authors would like to express their gratitude to the following individuals for their valuable contributions to this work. We acknowledge the contribution of the DEC DevOps and DS teams who produced the set of requirements that launched the initial draft of this work. We particularly thank Baptiste Frelet, Paul Bridier and Quentin Robcis with code review and general guidance regarding architecture feedback. We extend our thanks to Andres Camilo Murillo Coba and Sagar Jose for their day to day feedback and suggestions regarding emerging technologies in the field. Finally, special thanks to the title blocks extraction evaluation workshop participants, notably Chabane Guernine for their precious feedback on the first draft of the method, providing several clear paths of improvement. During the preparation of this manuscript, the authors used ChatGPT-5.0 Thinking And Google Gemini 3.0 for the purposes of spell checking, rewording and image generation. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Author Olivier Laurendin, Khwansiri Ninpan, Quentin Robcis, Richard Lehaut, Hélène Danlos and Nicolas Bureau were employed by the Digital Excellence Center departement of the company Assystem EOS. Author Robert Plana was employed by the Technology & Innovation department of the company Assystem EOS. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BIMBuilding Information Modeling
BOMBill Of Materials
CADComputer Assisted Design
COCOCommon Objects in Context
FPFalse Positive
FNFalse Negative
IoUIntersection over Union
ISOInternational Standard Organisation
OCROptical Character Recognition
TPTrue Positive
TNTrue Negative

Appendix A. Annotation Types Taxonomy

As stated above, annotation categories can be mainly divided in two groups. Annotation directly extracted from the image, referred to as detected categories, and provided by the user known as manual categories. Annotation categories are organized following a hierarchical structure (annotation categories graph) presented in Figure 5. In addition, some other categories are introduced to ease the template annotation process for the end user.

Appendix A.1. Detected Categories

  • Root: The title block boundaries.
  • Cell: A box of the title block defined by 4 line segments.
  • SubCell: A Cell annotation contained within another cell.
  • Checkbox: A small square SubCell nested within a NamedCheckBox annotation;
  • TableCell: A Cell (or SubCell) nested within a RegularTable annotation.
  • Cross: Defined by the crossing of two oblique line segments with a great difference in orientation.
  • TextBlock: The box of a word, defined as a continuous set of alphanumeric characters.

Appendix A.2. Manual Categories

  • KeyValuePair: Groups related annotations together with their associated key. Typically a group of Cells one of which containing a Key. Used to partition the elements of the title block along the expected output key value pairs.
  • Key: Groups a set of TextBlocks depicting a Cell key string.
  • NamedCheckBox: Groups a CheckBox and a set of TextBlocks together to form a selection field.
  • RegularTable: Groups a set of Cells (TableCells) with at least one header, either a ColumnHeaderCell or RowHeaderCell.
  • ColumnHeaderCell: Groups a set of Cells (TableCells) to depict the keys of each column of a RegularTable. Its key can be provided using a TableKeyCell.
  • RowHeaderCell: Groups a set of Cells (TableCells) to depict the keys of each row of a RegularTable. Its key can be provided using a TableKeyCell.
  • TableKeyCell: Selects a Cell (TableCells) whose content is a header cell key.

Appendix A.3. Other Categories

  • Line: To add extra line segments in case a cell border segments are faulty in the source title block.
  • WhitePatch: To ignore any line segment in the source title block.

Appendix B. Considered Distortions in the Real-Case Experiment

The real-case experiment introduces multiple distortions in both structure and content relative to the source title block. These distortions include missing annotations in the target title block, typographical errors in key labels and layout differences such as a misplaced checkbox w.r.t its text block.

Appendix B.1. Missing Annotations

The key value pairs of explicit keys “Jalon contractuel”, “CONTIENT DU SAVOIR FAIRE FOURNISSEUR” and “CONTIENT DU SAVOIR FAIRE XXX” are absent in the target title block.

Appendix B.2. Typographical Errors

Here is the set of typographical errors between the source and target title blocks:
Table A1. Table of target text blocks which possess some typographical error, and their associated source text block for comparison.
Table A1. Table of target text blocks which possess some typographical error, and their associated source text block for comparison.
Source Title Block Text BlocksTarget Title Block Text Blocks
“Mots clés(s)”“Mots clés”
“Projet(s) ou“Projet (s) ou programme(s)
Programme(s)nationaux concerné(s)”
nationaux concerné(s)”
“Type de document (produit type)”“Type de document (produit-type)”
“Nom du sous traitant”“Nom du sous-traitant”
“Référence du sous traitant”“Référence du sous-traitant”
“COPYRICHT XXX”“COPYRIGHT XXX”
“I”, “S”“OUI”, “NON”

Appendix B.3. Layout Differences

  • An additional empty line is added below the “peigne” in the target title block.
  • The regular table in the key value pair of key “Tableau de signatures” does not share the same dimensions as its counterpart in the source.
  • The regular table in the key value pair of key “Applicabilité” does not share the same column headers and dimensions as its counterpart in the source.
  • The key value pair of key “Référence XXX” has additional (empty) sub-cells in the target.
  • The checkboxes in the key value pair of key “Critère I ou S” are named “I” and “S” in the target title block as opposed to “OUI” and “NON” in the source title block.
  • The checkbox named “INTERNE” in the key value pair of key “Accessibilité XXX” has its name below itself in the target title block as opposed to its right in the source title block.

Appendix C. In-Depth Description of the Title Block Errors and Warning Logs

In the following are provided the set of warning logs produced by the model during the real-cases experiment. They are split among the true negatives logs depicting distortions described in Appendix B, and false negative logs which are missed annotations during the matching process.

Appendix C.1. True Negatives Logs

Source KeyValuePair of id 20978, key “jalon contractuel” and value “None” not
matched in target Root of id 25604.
Source KeyValuePair of id 20987, key “projets ou programmes nationaux concernes”
and value “None” not matched in target Root of id 25604.
Source KeyValuePair of id 21000, key “contient du savoir faire fournisseur” and
value “[{’oui’: True}, {’non’: False}]” not matched in target Root of id 25604.
Source KeyValuePair of id 21003, key “copyricht” and value “XXX 2022” not matched
in target Root of id 25604.
Target Cell of id 25687 and value “projet s ou programmes nationaux concernes”
not matched in source Root of id 25604.
Target Cell of id 25709 and value “copyright XXX 2024” not matched in source Root
of id 25604.
Source TextBlock of value oui not matched in target NamedCheckBox of id 21040 and
key “oui”.
Source TextBlock of value non not matched in target NamedCheckBox of id 21041 and
key “non”.
Source CheckBox of location [-1, 0] not matched in target NamedCheckBox of id
21043 and key “interne”.

Appendix C.2. False Negative Logs

Source KeyValuePair of id 20972, key “peigne didentification du document” and
value “None” not matched in target Root of id 25604.
Target Cell of id 25663 and value “X X X X X X X X X X X X X X X X X X X” not
matched in source Root of id 25604.

References

  1. Drawing SheetsTitle Blocks. Available online: https://www.roymech.co.uk/Useful_Tables/Drawing/Title_blocks.html (accessed on 23 December 2025).
  2. Toro, J.V.; Tarkian, M. Optimizing Text Recognition in Mechanical Drawings: A Comprehensive Approach. Machines 2025, 13, 254. [Google Scholar] [CrossRef]
  3. Najman, L.; Gibot, O.; Barbey, M. Automatic Title Block Location in Technical Drawings. In Proceedings of the 4th IAPR International Workshop on Graphics Recognition, Kingston, ON, Canada, 7–8 September 2001. [Google Scholar]
  4. Li, G.; Peng, Q.; Luo, M. Intelligent Extraction of Multi-style and Multi-template Title Block Information Based on Fuzzy Matching. Appl. Artif. Intell. 2024, 38, 2327005. [Google Scholar] [CrossRef]
  5. Lombardi, A.; Duan, L.; Elnagar, A.; Zaalouk, A.; Ismail, K.; Vakaj, E. Title Block Detection and Information Extraction for Enhanced Building Drawings Search. arXiv 2025, arXiv:2504.08645. [Google Scholar] [CrossRef]
  6. Maupou, C.; Yang, Y.; Fodop, G.; Qie, Y.; Migliorini, C.; Mehdi-Souzani, C.; Anwer, N. Automatic Raster Engineering Drawing Digitisation for Legacy Parts towards Advanced Manufacturing. Procedia CIRP 2024, 129, 234–239. [Google Scholar] [CrossRef]
  7. Sulaiman, R.; Fahmi Mohamad Amran, M.; Amlya Abd Majid, N. A Study on Information Extraction Method of Engineering Drawing Tables. Int. J. Comput. Appl. 2012, 50, 43–47. [Google Scholar] [CrossRef]
  8. Haar, C.; Kim, H.; Koberg, L. AI-Based Engineering and Production Drawing Information Extraction. In Flexible Automation and Intelligent Manufacturing: The Human-Data-Technology Nexus; Kim, K.Y., Monplaisir, L., Rickli, J., Eds.; Springer: Cham, Switzerland, 2023; pp. 374–382. [Google Scholar] [CrossRef]
  9. AI Feature Extraction for Technical Drawings. Available online: https://werk24.io/ (accessed on 23 December 2025).
  10. Support | Kahua. Available online: https://resources.kahua.com/customer-training-videos-specialty-apps-and-others/part-4-using-title-block-extraction (accessed on 22 January 2026).
  11. Automated Drawing Extraction | Autodesk. Available online: https://help.autodesk.com/view/DOCS/ENU/?guid=Automated_Drawing_Extraction (accessed on 23 December 2025).
  12. laujan. Custom Template Document Model—Document Intelligence—Azure AI Services. 2024. Available online: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/train/custom-template?view=doc-intel-4.0.0 (accessed on 23 December 2025).
  13. Custom Extractor Overview|Document AI. Available online: https://docs.cloud.google.com/document-ai/docs/custom-extractor-overview (accessed on 23 December 2025).
  14. Bedrock Data Automation Projects—Amazon Bedrock. Available online: https://docs.aws.amazon.com/bedrock/latest/userguide/bda-projects.html (accessed on 23 December 2025).
  15. Shen, Z.; Zhang, R.; Dell, M.; Lee, B.C.G.; Carlson, J.; Li, W. LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. arXiv 2021, arXiv:2103.15348. [Google Scholar] [CrossRef]
  16. Huang, Y.; Lv, T.; Cui, L.; Lu, Y.; Wei, F. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. arXiv 2022, arXiv:2204.08387. [Google Scholar] [CrossRef]
  17. Kim, G.; Hong, T.; Yim, M.; Nam, J.; Park, J.; Yim, J.; Hwang, W.; Yun, S.; Han, D.; Park, S. OCR-free Document Understanding Transformer. arXiv 2022, arXiv:2111.15664. [Google Scholar] [CrossRef]
  18. Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633. [Google Scholar] [CrossRef]
  19. Cui, C.; Sun, T.; Lin, M.; Gao, T.; Zhang, Y.; Liu, J.; Wang, X.; Zhang, Z.; Zhou, C.; Liu, H.; et al. PaddleOCR 3.0 Technical Report. arXiv 2025, arXiv:2507.05595. [Google Scholar] [CrossRef]
  20. Mindee/Doctr. Available online: https://github.com/mindee/doctr.
  21. JaidedAI/EasyOCR. Available online: https://github.com/JaidedAI/EasyOCR.
  22. Facebookresearch/Detectron2. Available online: https://github.com/facebookresearch/detectron2.
  23. ISO 7200:2004. Available online: https://www.iso.org/fr/standard/35446.html (accessed on 23 December 2025).
  24. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar] [CrossRef]
  25. OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar] [CrossRef]
  26. Wang, P.; Bai, S.; Tan, S.; Wang, S.; Fan, Z.; Bai, J.; Chen, K.; Liu, X.; Wang, J.; Ge, W.; et al. Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution. arXiv 2024, arXiv:2409.12191. [Google Scholar] [CrossRef]
  27. laujan. General Key-Value Extraction—Document Intelligence—Foundry Tools. Available online: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/general-document?view=doc-intel-4.0.0 (accessed on 23 December 2025).
  28. Smock, B.; Pesala, R.; Abraham, R. PubTables-1M: Towards Comprehensive Table Extraction from Unstructured Documents. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4624–4632. [Google Scholar] [CrossRef]
  29. Stefanics, D.; Fox, M. COCO Annotator: Web-Based Image Segmentation Tool for Object Detection, Localization, and Keypoints. ACM SIGMultimedia Rec. 2022, 13, 7. [Google Scholar] [CrossRef]
Figure 1. An example industrial title block taken from a technical drawing with a set of named fields and a revision history table [1].
Figure 1. An example industrial title block taken from a technical drawing with a set of named fields and a revision history table [1].
Algorithms 19 00105 g001
Figure 2. Presentation of a mock-up graphical interface illustrating the visualization of title block annotations, either defined from a template or inferred by an OCR-based pipeline on a synthetic industrial title block. Field values are displayed as colored bounding boxes overlaid on the title block, together with their associated keys, to provide the end user with localized structural information.
Figure 2. Presentation of a mock-up graphical interface illustrating the visualization of title block annotations, either defined from a template or inferred by an OCR-based pipeline on a synthetic industrial title block. Field values are displayed as colored bounding boxes overlaid on the title block, together with their associated keys, to provide the end user with localized structural information.
Algorithms 19 00105 g002
Figure 3. Demonstration of the behavior of a data-driven document understanding model (Azure Document Intelligence, General Document model) on two title blocks containing the same information (same key-value pairs, each represented by a different color) but exhibiting different structures. (a,d) Input title blocks. (b,e) Key–value pair extraction outputs: identical key–value pairs are returned for both title blocks, despite the structural differences, which prevents detecting structural non-compliances from this representation alone. (c,f) Layout/table extraction outputs: tabular structures are provided with spatial organization preserved, enabling downstream structural compliance checking provided that keys and values are reliably linked.
Figure 3. Demonstration of the behavior of a data-driven document understanding model (Azure Document Intelligence, General Document model) on two title blocks containing the same information (same key-value pairs, each represented by a different color) but exhibiting different structures. (a,d) Input title blocks. (b,e) Key–value pair extraction outputs: identical key–value pairs are returned for both title blocks, despite the structural differences, which prevents detecting structural non-compliances from this representation alone. (c,f) Layout/table extraction outputs: tabular structures are provided with spatial organization preserved, enabling downstream structural compliance checking provided that keys and values are reliably linked.
Algorithms 19 00105 g003
Figure 4. Examples of the generic entities considered for this study. Fields were anonymized for privacy. (a) Form field (set of explicit key–value pairs). (b) Radio button (single output choice field). (c) Revision history table (regular table of variable number of lines). (d) Comb field (text field with no explicit key). (e) Named check box (boolean choice field). (f) Applicability table (regular table of variable shape and headers).
Figure 4. Examples of the generic entities considered for this study. Fields were anonymized for privacy. (a) Form field (set of explicit key–value pairs). (b) Radio button (single output choice field). (c) Revision history table (regular table of variable number of lines). (d) Comb field (text field with no explicit key). (e) Named check box (boolean choice field). (f) Applicability table (regular table of variable shape and headers).
Algorithms 19 00105 g004aAlgorithms 19 00105 g004b
Figure 5. The annotation categories graph considered for this study, depicting the possible hierarchical relationships between annotations based on their category. From a singular Root annotation, depicting the entirety of the title block, down to individual TextBlock and Cross annotations.
Figure 5. The annotation categories graph considered for this study, depicting the possible hierarchical relationships between annotations based on their category. From a singular Root annotation, depicting the entirety of the title block, down to individual TextBlock and Cross annotations.
Algorithms 19 00105 g005
Figure 6. (a) Manual (top) and detected (bottom) annotations overlaid on a form field. (b) Corresponding annotation graph for the form field.
Figure 6. (a) Manual (top) and detected (bottom) annotations overlaid on a form field. (b) Corresponding annotation graph for the form field.
Algorithms 19 00105 g006
Figure 8. Workflow of the production of the template from the source title block (a) and its application on a target title block (b). A source title block template is created by defining title block annotations through a graphical interface, check the extracted information and store the validated template in a centralized library for future reuse. Second, the template is applied to a target title block by selecting an appropriate template from the library, presenting any error messages generated during application to support user interpretation, and optionally visualize the extracted information to facilitate verification of the extraction results.
Figure 8. Workflow of the production of the template from the source title block (a) and its application on a target title block (b). A source title block template is created by defining title block annotations through a graphical interface, check the extracted information and store the validated template in a centralized library for future reuse. Second, the template is applied to a target title block by selecting an appropriate template from the library, presenting any error messages generated during application to support user interpretation, and optionally visualize the extracted information to facilitate verification of the extraction results.
Algorithms 19 00105 g008
Figure 9. Visualization of title block distortions in the source (a) and target (b) in the real-case experiment. In red are missing annotations, in green are layout differences and in blue are typographical differences. All distortions in dotted lines are variabilities (compliant distortions following client specifications) and are indicated as such in the template using metadata fields presented in Section 3.3.3 while those in solid lines are non-compliances that must be flagged by the system. Variable fields include the revision history table (at the very top) whose number of lines is variable; and the applicability table (bottom-right) whose layout and column names are variable.
Figure 9. Visualization of title block distortions in the source (a) and target (b) in the real-case experiment. In red are missing annotations, in green are layout differences and in blue are typographical differences. All distortions in dotted lines are variabilities (compliant distortions following client specifications) and are indicated as such in the template using metadata fields presented in Section 3.3.3 while those in solid lines are non-compliances that must be flagged by the system. Variable fields include the revision history table (at the very top) whose number of lines is variable; and the applicability table (bottom-right) whose layout and column names are variable.
Algorithms 19 00105 g009
Figure 10. Annotation matching evaluation metrics formulations. The a n n o t a t i o n d e t e c t i o n a c c u r a c y   r a t e is effectively the proportion of properly matched compliant target annotations out of compliant target annotations (or recall), while a l a r m a c c u r a c y r a t e is the proportion of properly flagged non-compliant target annotations among non-compliant target annotations (or specificity). The values of T P , T N , F P and F N are calculated following Algorithm 3.
Figure 10. Annotation matching evaluation metrics formulations. The a n n o t a t i o n d e t e c t i o n a c c u r a c y   r a t e is effectively the proportion of properly matched compliant target annotations out of compliant target annotations (or recall), while a l a r m a c c u r a c y r a t e is the proportion of properly flagged non-compliant target annotations among non-compliant target annotations (or specificity). The values of T P , T N , F P and F N are calculated following Algorithm 3.
Algorithms 19 00105 g010
Figure 11. Overlaid manual annotations on the source (a) and target (b) title blocks. The colors of the annotations encode their category, as defined in Figure 5. Annotations over the target title block fit the boundaries of underlying cells or text as they are automatically extracted, whereas those over the source title block fit loosely as they were input by the user. All compliant target annotations have been properly matched (except the comb field), including those that have a compliant distortion (variability) as indicated in Figure 9.
Figure 11. Overlaid manual annotations on the source (a) and target (b) title blocks. The colors of the annotations encode their category, as defined in Figure 5. Annotations over the target title block fit the boundaries of underlying cells or text as they are automatically extracted, whereas those over the source title block fit loosely as they were input by the user. All compliant target annotations have been properly matched (except the comb field), including those that have a compliant distortion (variability) as indicated in Figure 9.
Algorithms 19 00105 g011
Table 1. Control experiment results. Are reported the number of annotation for each annotation category in the source and target title blocks, and their classification following the confusion matrix defined in Figure 3.
Table 1. Control experiment results. Are reported the number of annotation for each annotation category in the source and target title blocks, and their classification following the confusion matrix defined in Figure 3.
Annotation CategoryTrue AnnotationsTarget Annotations
SourceTargetTPFNTNFP
KeyValuePair3434331  
Key292929   
NamedCheckBox141414   
ColumnHeaderCell333   
RowHeaderCell111   
RegularTable222   
TableKeyCell222   
Total858584100
Table 2. Real case experiment results. Are reported the number of annotation for each annotation category in the source and target title blocks, and their classification following the confusion matrix defined in Figure 3.
Table 2. Real case experiment results. Are reported the number of annotation for each annotation category in the source and target title blocks, and their classification following the confusion matrix defined in Figure 3.
Annotation categoryTrue annotationsTarget annotations
SourceTargetTPFNTNFP
KeyValuePair34312815 
Key292622 43
NamedCheckBox14107 7 
ColumnHeaderCell333   
RowHeaderCell111   
RegularTable222   
TableKeyCell222   
Total8575651163
Table 3. Warnings and errors sources analysis along annotation categories and distortion types. False Negatives refer to unmatched compliant target annotations, True Negatives and False Positives are flagged and missed non-compliant target annotations respectfully. These distortions include missing annotations in the target title block (“Missing”), typographical errors in key labels (“Typographical”), layout differences such as a misplaced checkbox w.r.t its text block (“Layout”), and unsupported annotation type, here a comb field (“Unsupported”). Precisions about the distortions and associated warning messages are provided in Appendix B and Appendix C.
Table 3. Warnings and errors sources analysis along annotation categories and distortion types. False Negatives refer to unmatched compliant target annotations, True Negatives and False Positives are flagged and missed non-compliant target annotations respectfully. These distortions include missing annotations in the target title block (“Missing”), typographical errors in key labels (“Typographical”), layout differences such as a misplaced checkbox w.r.t its text block (“Layout”), and unsupported annotation type, here a comb field (“Unsupported”). Precisions about the distortions and associated warning messages are provided in Appendix B and Appendix C.
False NegativesMissingTypographicalLayoutUnsupportedTotal
KeyValuePair 11
True NegativesMissingTypographicalLayoutUnsupportedTotal
KeyValuePair32 5
Key22 4
NamedCheckBox421 7
False PositivesMissingTypographicalLayoutUnsupportedTotal
Key 3 3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Laurendin, O.; Ninpan, K.; Robcis, Q.; Lehaut, R.; Danlos, H.; Bureau, N.; Plana, R. A Template-Based Approach for Industrial Title Block Compliance Check. Algorithms 2026, 19, 105. https://doi.org/10.3390/a19020105

AMA Style

Laurendin O, Ninpan K, Robcis Q, Lehaut R, Danlos H, Bureau N, Plana R. A Template-Based Approach for Industrial Title Block Compliance Check. Algorithms. 2026; 19(2):105. https://doi.org/10.3390/a19020105

Chicago/Turabian Style

Laurendin, Olivier, Khwansiri Ninpan, Quentin Robcis, Richard Lehaut, Hélène Danlos, Nicolas Bureau, and Robert Plana. 2026. "A Template-Based Approach for Industrial Title Block Compliance Check" Algorithms 19, no. 2: 105. https://doi.org/10.3390/a19020105

APA Style

Laurendin, O., Ninpan, K., Robcis, Q., Lehaut, R., Danlos, H., Bureau, N., & Plana, R. (2026). A Template-Based Approach for Industrial Title Block Compliance Check. Algorithms, 19(2), 105. https://doi.org/10.3390/a19020105

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop