More and more contemporary companies are shifting their business paradigm from design and manufacture of artefacts to the provision of through-life support, leading to a knowledge intensive product, covering a large area, not only engineering issues, but human and business related, spanning the whole product life cycle [1
]. In addition to this, the global economy has increased the amount of collaboration that happens within modern engineering design practices. With designers located in disparate locations, it is becoming particularly important (and more difficult) to capture, exchange and reuse the information and experiences generated during these design activities.
Engineering design is the use of scientific principles, technical information and imagination in the definition of a mechanical structure, machine or system to perform pre-specified functions with the maximum economy and efficiency [3
]. Generally, the design process can be regarded as a series of activities, which aim to create a desired artefact. Included in this process are: decision-making, problem-solving, rationale related to decisions, and various information resources (such as catalogues, design/factory guidelines, and properties databases). During the last decades, there has been a lot of work carried out to support engineering design in both the academic and commercial communities. Recent examples have been a number of Computer Aided Design (CAD) systems [e.g. 4
], various design process models [e.g. 7
] and rationale tools [e.g. 10
], and several techniques for Computer Supported Cooperative Work (CSCW) [e.g. 12
]. However, in today’s industry, such traditional supporting tools are still insufficient due to various limitations, e.g. lack of engineering context and semantics in CAD models, extra manual work needed, etc.
Recently, with the rapid development and adoption of digital technologies, annotation and markup are attracting more and more attentions for information communication, retrieval and management. Correspondingly, some critical research work has already been conducted on supporting and capturing engineering design process. This paper presents a state-of-the-art review of recent research in markup for engineering design aiming to provide a complete view of markup on requirements, technique status, applications and future directions. The subsequent content is organized as follows. Section 2
introduces the definition and classification of annotation and markup. Section 3
and section 4
review the current core markup languages and markup strategies, respectively. Their applications and future utilization in engineering design, including engineering document management, support of multi-viewpoint, product lifecycle information support, design communications in collaborative environment, and integration of engineering design processes, are comprehensively discussed in section 5
. Finally, conclusions are drawn in section 6
and future research directions are suggested based on the discussion in section 7
2. Annotation and markup
Annotation can be simply understood as the act of adding additional information for various purposes, such as commentary, viewpoint interpretation or extra description. Annotation has been a part of engineering practice to aid communication for a long time, for example, engineers discuss the product face-to-face through annotating a design drawing, or sending an annotated drawing of a product to colleagues (an example is shown in Figure 1
). Markup may have different definitions according to its application domains, such as the publishing industry or in the world of computer science. In this paper, markup is regarded as a subtype of annotation and is defined as a formally structured annotation for a purpose, normally to allow some kind of manipulation of the information entity [14
]. Over last two decades, many mark-up methods have been developed and these can be classified into six basic categories [15
Punctuational mark-up: where word, phrase, and sentence boundaries are identified by spaces, commas, full stops, and other punctuation characters inserted into the text.
Presentational mark-up: where the visual form of the document is specified directly.
Procedural mark-up: in which presentational instructions (or commands) for some particular processing system are embedded in the text.
Descriptive mark-up: the author identifies the element types as tokens, as often found in applications of SGML and XML, which approach documents as structured objects containing semantically interpretable parts.
Referential mark-up: refers to entities external to the document and is replaced by those entities during processing.
Meta-mark-up: provides a facility for controlling the interpretation of mark-up and for extending the vocabulary of descriptive mark-up languages (e.g. macros).
An example of annotation
An example of annotation
To date, digital information has predominately been stored in different structures and formats depending on which specific software being used. However, it is normal that such information needs to be translated into different formats (e.g. Microsoft word format, HTML, PDF) for purposes such as presentation and accessibility in different contexts. A typical example is graphs of experimental results embedded in Microsoft Word document. Such traditional methods succeed in visualization of graphs, but fail by missing other information, like how the results were produced and the raw data which they represent. In terms of solving this issue, descriptive markup is able to show potential benefits. Firstly, the descriptive markup allows users to focus on both the structure and content of document so it has greater portability. Secondly, descriptive markup separates how data is stored and how it is used and therefore can potentially provide any additional information required when transforming the object to different formats for the different purposes or when to be re-processed or re-organized for different applications and uses.
To allow markup to be implemented and to serve applications in engineering design, it is essential to have adequate understanding of markup languages and strategies. These will be discussed in the next two sections.
4. Mark-up strategy
The conventional method of markup of a document is ‘in-line’ with the text of the documents - i.e. text markup labels merged into a text document. From a computer system’s point of view, the markups change a document from a long sequence of strings to a block of information with semantic meaning. The ‘in-line’ markup method is easily accomplished using a text editor or a simple scripting language, and therefore it has become one of the most commonly used methods. However, there are some disadvantages of the ‘in-line’ markup method. For example, as the method inserts markup directly into the text of the document, it actually changes the document. As the markup labels are embedded in the document, it is difficult to place multiple independent sets of markup in the same document without them interfering with each other.
To overcome this, an alternative method called ‘stand-off’ markup [41
] is introduced. Different from ‘in-line’ markup, the content in the ‘stand-off’ markup is stored in separate external documents utilizing a system of references or pointers to which element the markup refers. The work of Ding et al.
], Davies and McMahon [43
] and Alink [44
], has shown that the ‘stand-off’ markup method has many advantages over “in-line” markup, including:
It allows original digital object (i.e. including both documents and various digital model) progressively to expand to include extra information (e.g. semantic context and rationale behind) and metadata without changing the representation method used for the original object. Thus, it provides a good tool to support the continuous update of information throughout a product lifecycle.
Many layers of markup can be associated with the same object, allowing different viewpoints to be associated with, both concurrent viewpoints (e.g. cooperators or partners during the design phase) and successive (e.g. technology requirements from machining engineers or maintenance information from in-service phase). That is, with ‘stand-off’ markup, information from various viewpoints with different structures or formats can be carried, while the original object itself however, remains ‘light’ and can be passed with only the information necessary for a specific user or purpose.
The information pertaining to one viewpoint can be put in a separate markup file; and multiple independent markup files can be safely applied to the same object. Thus, it allows context-specific information to be manipulated into different viewpoints, freely tailored to a reduced version for reasons of security/IP (Intellectual Property) and the requirements of different target users, and re-organized for various purposes and applications.
The major disadvantage of ‘stand-off’ markup is how to implement it, specifically how to get robust, persistent references back to the object and how to link from the object to the markup?
summarizes advantages and disadvantages of the two strategies.
Advantages and disadvantages of markup strategies.
Advantages and disadvantages of markup strategies.
Non-change of representation method used for the original object
Support of multiple independent sets of markup
Support of progressively information update
Capability of re-organization of markup information for different purposes and applications
Difficulty of implementation
Problem of persistent references
Lack of robust maintenance method of references
Although more work is needed on its implementation, comparing to ‘in-line’ markup, the application of the ‘stand-off’ markup has shown more great potential in engineering design, particularly on the trend to management of product information throughout its lifecycle. In general, product information encompasses a vast heterogeneous range of information, required and produced by many different individuals and groups who interact with the product throughout its lifecycle. Each of these actors requires different subsets of the product information and different structuring of the information. In addition, 3D digital product models (i.e. CAD models) have already dominated in industry design. The structure of CAD models (e.g. B-rep) is designed for representing 3D geometrical information, but has only limited capabilities for storing markups natively. Meanwhile, a number of geometrical representation methods make it uneconomical or even impossible to insert various types of markups using the ‘in-line’ method. The ‘stand-off’ approach, however, allows markup information to be stored in separate documents and linked back to the model using references or pointers, and therefore it actually extends the product models to additional engineering and non-engineering information not currently supported by the established formats. This will be further discussed in the next section.
shows the comparison of ‘in-line’ and ‘stand-off’ markup methods.
‘In-line’ and ‘Stand-off’ markup methods.
‘In-line’ and ‘Stand-off’ markup methods.
In practice, markup is usually done manually, semi-automatically or automatically according to how much human effort is involved. Manual mark-up is the most popular, most accurate (with human interpretation) but most labour-intensive. Automatic mark-up reduces manual intervention and gives a more integral representation of the documents.
There are limited publications which are concerned with automatic and hybrid mark-up. For example [45
], explored automatic mark-up with different methods and targeting on different types of documents. In [45
], the authors reported on the design and implementation of a system which automates the process of capturing structured documents from the optically recognised form of printed materials to elements like words, sentences, title, authors, etc.
but not for logical content elements. A novel system that can automatically markup text documents into XML is discussed in [46
]. The system uses a self-organising map (SOM) algorithm and inductive learning algorithm. Experiments were carried out with business letters. The system can extract elements like address, date, salutation, paragraph and closing, etc.
This system is adaptive in nature and learns from errors to improve mark-up accuracy. Cui discussed an automatic mark-up system which is based on machine learning methods and enhanced by machine learned domain rules and conventions [47
]. The work is concentrating on taxonomic descriptions of plants (flora). Although machine learning systems are state of the art, especially for simple tagging problems, knowledge-based systems (mostly rule-based) have traditionally been the top performers in most information extraction benchmarks, and still retain some advantages. Feldman, Rosenfeld and Fresko propose a hybrid semantic tagging approach, which combines the power of knowledge-based and statistical machine learning [48
]. The rules for the extraction grammar are written manually, while the probabilities are trained from an annotated corpus. The experiments show that the hybrid approach outperforms both purely statistical and purely knowledge-based systems. Vargas-Vera et al.
present an annotation tool that provides both automated and semi-automated support for marking up Web pages with semantic contents [49