ODQ: A Fluid Office Document Query Language
Abstract
:1. Introduction and Motivation
2. Design of ODQ
2.1. Design Principles
- (1)
- ODQ should be a non-procedural query language without branch and loop structure for fluid office documents, so can easily be embedded in any high-level language.
- (2)
- ODQ can be used to access fluid office documents directly without any office software, so can easily be integrated in fluid office document application development.
- (3)
- A common document model is required to conceal the format details between different document format standards and improve document interoperability.
- (4)
- Common functions should be provided to meet the various needs of users. For instance, query, update and delete operations for metadata, paragraph, section, table, and so on.
- (5)
- Independent from platforms, document formats and versions, programming languagesand applications.
- (6)
- Syntax should be as simple as possible to reduce the difficulty of learning for developers.
2.2. Ontology-Based Common Document Model
- (1)
- As each fluid office document format has its own document model, the common document model should not only extract common function points from them, but also hide differences between them.
- (2)
- The common document model should be as flattened as possible, which ensures the path expression in ODQ statement simple enough. A method proposed in the literature [7] can make the common document model flatten enough.
- (1)
- The set of function points C which includes most commonly used function points. For example, the document function point represents the whole document and the paragraph function point represents a certain paragraph in a document.
- (2)
- The property set of function points which includes all properties of function points defined in C. Each function point has multiple properties. Table 1 lists part properties of the paragraph function point.
- (3)
- The set of relationships R which contains not only relationships between function points, but relationships between a function point and its properties.
- Part-of relationship which describes a function point that is a part of another one. For instance, represents paragraph is a part of document.
- Property-of relationship which describes the property that a function point has, for example, shows that paragraph function point has a fontName property.
- (4)
- The resources O which includes all function points in various fluid document models.
- (5)
- maps function points in common document model to those in a particular document model. For example, represents the mapping of paragraph in common document model to the element p in OOXML document model.
Properties | Description |
---|---|
text | Textual Content of a Paragraph |
indentLeft | Left Indent Value of a Paragraph |
indentRight | Right Indent Value of a Paragraph |
lineSpaceType | Line Space Type of a Paragraph |
lineSpaceValue | Line Space Value of a Paragraph |
fontName | Font Name of a Paragraph |
fontSize | Font Size of a Paragraph |
fontColor | Font Color of a Paragraph |
2.3. ODQ Syntax Design
- (1)
- SELECT_STATEMENT :: = “SELECT”<AttributeList> | <NodeList>“FROM”<URL> [“WHERE”<ConditionList>]
- (2)
- <AttributeList> :: = <Attribute> | <AttributeList>, <Attribute>
- (3)
- <URL> :: = <NodeList> [“of”<NodeList>]*
- (4)
- <ConditionList> :: = <Condition> [ “AND” |“OR”<Condition>]*
- (5)
- <Attribute> :: = <DocumentAttribute> | <MetaAttribute> | <SectionAttribute> | <PargraphAttribute>
- (6)
- <DocumentAttribute> :: = “text” |“fontName” | “fontSize” | “fontColor”
- (7)
- <MetaAttribute> :: = “Author” | “Title” | “Creator” | “CreationDate”
- (8)
- <SectionAttribute> :: = “text” |“fontName”| “fontSize” | “fontColor”
- (9)
- <PargraphAttribute> :: = “text” | “indentLeft” | “indentRight” | “fontName”| “fontSize” | “fontColor”
- (10)
- <NodeList> :: =<Node> [<NumberRange>] | <NodeList>, <Node> [<Range>]
- (11)
- <Node>:: = “document” | “section” | “paragraph” | “table” | “run” | “meta”
- (12)
- <NumberRange> :: = “all” | <Range> | <NumberList>
- (13)
- <Range> :: = <NumberList> “-”<NumberList>
- (14)
- <NumberList> :: = <Number> | <NumberList><Number>
- (15)
- <Number> :: = “1” | “2” | “3” | “4”| “5” | “6” | “7” | “8” | “9” | “0”
- (16)
- <Condition> :: = <Attribute>“=”<AttributeValue>
- (17)
- <AttributeValue>::=<TextValue>|<FontNameValue>|<FontSizeValue>| <FontColorValue> | <IndentLeftValue>
- (18)
- <TextValue> :: = <String>
- (19)
- <FontNameValue> :: = “宋体”| “黑体” | “Times New Roman”
- (20)
- <FontSizeValue> :: = <NumberList>
- (1)
- SELECT clause lists the contents that should be returned by the query. All function points here are from common document model, but the results are fetched from the underlying fluid office document by map-to relationship.
- (2)
- FROM clause includes a path expression indicating the document from which content should be obtained. Path expression contains series function points with the OF keyword between them.
- (3)
- WHERE clause is optional and indicates the conditions under which information will be included in the result.
- Example 1: Query text property. Query text property of document function point to fetch the textual content of sample.uot.SELECT text FROM sample.uot;
- Example 2: Query style property. SELECT fontName and fontSize property of the second paragraph in the second section of sample.docx.SELECT fontName, fontSize FROM paragraph[2] of section[2] of sample.docx;;
- Example 3: Query function point. Get the second paragraph in the second section of document sample.odf, and the results will include all properties of the second paragraph.SELECT paragraph[2] FROM section[2] of sample.odf;
- Example 4: Conditional query. Get all paragraphs whose font name is “Times New Roman”.SELECT paragraph FROM sample.docx Where fontName = “Times New Roman”;
3. Query Parsing and Result Generation
- IDocument * doc = new IDocument();
- ISectionSet* sections = doc->getSectionSet();
- ISection* section = sections->getItemByID(1)
- …….
4. Experiments and Evaluation
ODQ Command | SELECT text FROM paragraph[1–2] of filename(note: can access document with any format) |
OOXML API (note: can only access OOXML document) | for (inti = 1; i<= 2; i++) { Paragraph p = doc.Range().Paragraphs[i]; text += p.Range.Text; } |
UOF API (note: can only access UOF document) | for(inti = 0; i< 2; i++) { IParagraph p = (IParagraph) textDoc.getParagraphs().getItemByIndex(i); ITextRunSet runs = p.getTextRuns(); for(int j = 0; j <runs.getCount(); j++) { ITextRun r = (ITextRun) runs.getItemByIndex(j); text += r.getTextContent(); } } |
Testing Functions | OOXML API | UOF API | ODQ |
---|---|---|---|
Get author of a given document | 1 | 1 | 1 |
Get title of a given document | 1 | 1 | 1 |
Get creator of a given title of a document | 4 | 4 | 1 |
Get creation time of a given title of a document | 4 | 4 | 1 |
Get text content of a given document | 4 | 10 | 1 |
Get text content of a given section | 4 | 10 | 1 |
Get text content of a given paragraph | 1 | 7 | 1 |
Get all paragraphs whose font name are “Times New Roman” | 4 | 14 | 1 |
Get text content whose font name is “Times New Roman” from the first section. | 4 | 15 | 1 |
Get font name of a given paragraph | 4 | 4 | 1 |
Get a given paragraph | 1 | 1 | 1 |
Get a given table | 1 | 1 | 1 |
Get a cell from a given table | 2 | 2 | 1 |
ODQ Command | SELECT text FROM paragraph[1–2] of filename |
XQuery | declare namespace w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"; for $x in doc("document.xml")/w:document/w:body/w:p[2]/w:r for $y in $x/w:t return xs:string($y) |
5. Conclusions
- It hides differences between fluid office document formats and facilitates interoperability among all kind of office documents.
- It offers a united interface for user to handle different format documents.
- Thanks for the features of ODQ, e.g., non-procedural, platform- and language-independent, it is easy to embed into document-based applications to access the fluid office documents either remotely or locally.
- It has simple syntax, thus is very easy to use.
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Wang, D.L.; Jiang, H.F.; Zhang, C.Y. UOML: An unstructured operation markup language. Inf. Technol. Inf. 2007, 3, 121–125. [Google Scholar]
- OASIS. Information Technology—UOML (Unstructured Operation Markup Language) Part 1 Version 1.0. Available online: http://docs.oasis-open.org/uoml-x/v1.0/os/uoml-part1-v1.0-os.html (accessed on 3 October 2013).
- ISO/IEC. Information Technology—Open Document Format for Office Applications (OpenDocument) v1.0. Available online: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=43485 (accessed on 12 August 2014).
- ISO/IEC. Information Technology—Office Open XML file formats. Available online: http://www.iso.org/iso/news.htm?refid=Ref1181 (accessed on 12 August 2014).
- Specification for the Chinese office file Format (GB/T). Available online: http://doc.csres.com/showdoc-2541-44390.html (accessed on 2 October 2014).
- Tang, Y.; Tian, Y.A.; Li, N. Analysis of methods to access XML-based fluid office documents. Comput. Eng. Design 2014, 4, 1458–1464. [Google Scholar]
- Sun, Q.G.; Zhu, W.; Liu, H.J.; Zhang, P. Data integration of open document format on XQuery. Comput. Syst. Appl. 2008, 7, 32–34. [Google Scholar]
- Ling, F.; Liu, X.H.; Tian, Y.A. Flatten design of open document query. In Proceedings of the International Conference on Cyberspace Technology (CCT2013), Beijing, China, 23 November 2013.
- Li, N.; Liang, Q.; Shi, Y.M. The function of format information in document understanding. J. Beijing Inf. Sci. Technol. Univ. 2012, 6, 1–7. [Google Scholar]
- Wang, H.; Gu, J.; Su, X.N. Research on the Model and Its Application of Ontology-driven Knowledge Management System. J. Libr. Sci. China 2013, 3, 98–110. [Google Scholar]
© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Li, N.; Shi, Y.; Hou, X. ODQ: A Fluid Office Document Query Language. Information 2015, 6, 275-286. https://doi.org/10.3390/info6020275
Liu X, Li N, Shi Y, Hou X. ODQ: A Fluid Office Document Query Language. Information. 2015; 6(2):275-286. https://doi.org/10.3390/info6020275
Chicago/Turabian StyleLiu, Xuhong, Ning Li, Yunmei Shi, and Xia Hou. 2015. "ODQ: A Fluid Office Document Query Language" Information 6, no. 2: 275-286. https://doi.org/10.3390/info6020275
APA StyleLiu, X., Li, N., Shi, Y., & Hou, X. (2015). ODQ: A Fluid Office Document Query Language. Information, 6(2), 275-286. https://doi.org/10.3390/info6020275