Smart Project Management: Interactive Platform Using Natural Language Processing Technology

: Technological developments have made the construction industry efﬁcient. The aim of this research is to solve communication interaction problems to build a project management platform using the interactive concept of natural language processing technology. A comprehensive literature review and expert interviews associated with techniques dealing with natural languages suggests the proposed system containing the Progressive Scale Expansion Network (PSENet), Convolutional Recurrent Neural Network (CRNN), and Bi-directional Recurrent Neutral Networks Convolutional Recurrent Neural Network (BRNN-CNN) toolboxes to extract the key words for construction projects contracts. The results show that a fully automatic platform facilitating contract management is achieved. For academic domains, the Contract Keyword Detection (CKD) mechanism integrating PSENet, CRNN, and BRNN-CNN approaches to cope with real-time massive document ﬂows is novel in the construction industry. For practice, the proposed approach brings signiﬁcant reduction for manpower and human error, an alternative for settling down misunderstanding or disputes due to real-time and precise communication, and a solution for efﬁcient documentary management. It connects all contract stakeholders proﬁciently.


Introduction
Most of current construction industry management is mainly based on human rule management assisted by technology systems to implement automatic construction management [1][2][3][4][5]. However, the quality of construction management systems on the market is uneven, a common blind point for them is that they cannot effectively connect and interact with each party [6]. Data input and output for systems cannot be easily shared according to the needs of the work, which becomes obstacles among parties [7][8][9][10]. As a result, party cooperation within a construction project may not work smoothly among all phases, including planning, procurement, etc. Party members may need to face consequences by poor management, which may be as serious as litigation [11][12][13][14]. A need to facilitate management process and lower barriers among all contract parties is overwhelming. In summary, the construction and property industry are currently facing problems caused by the inability of the current management system to share and interact information [15][16][17]. The aim of this research is to solve communication interaction problems and to build a project management platform using an interactive concept of natural language processing technology. Through a comprehensive literature review, interviewing experts, developing the bridges (keyword detection) connecting parties, the interactive platform for construction contract stakeholders is developed for more in-depth management.

Applications for Scene Text Detection
Scene text detection is intuitively understood. Given an image, all the positions where the text appears in this image are needed, that is, find the position of each object in the image, and mark the objects in the bounding box because images are divided into two categories, with and without text, which is a single-class detection task. Faster Region-based Convolutional Recurrent Neural Network (RCNN) can be used for text detection [22,23]. The Faster RCNN model is mainly composed of two modules: the Region Proposal Network (RPN) candidate frame extraction module and the Faster RCNN detection module, which can be subdivided into four parts: (1) Conv Layer, (2) RPN, (3) Roll Pooling, and (4) Classification and Regression. The steps of Faster RCNN are divided into [24]: (1) The basic network performs feature extraction, (2) features are sent to RPN for candidate frame extraction, (3) the classification layer classifies the objects in the candidate frame, and the regression layer fine-tunes the (x, y, w, h) of the candidate frame. However, the effect of Faster RCNN as text detection is not ideal due to the uniqueness of each text. For example, common objects have obvious closed edge contours, but the text does not. The text contains multiple texts, and there is text between the texts. If the interval between texts cannot be detected, each character is treated as a text line and it is framed instead of the entire line. Based on the said reasons, general networks such as Faster RCNN must be improved to design a new network framework suitable for text detection.
A work regarding detecting text in Natural Image was proposed [24]. This deep neural network is called Connectionist Text Proposal Network (CTPN), which can accurately locate lines of text in natural images. CTPN detects text lines directly in a series of finegrained text proposals in convolutional feature correspondence. In this paper, a vertical anchor mechanism was developed to jointly predict the position and text/non-text score of each fixed-width proposal, greatly improving. For positioning accuracy, the sequence is proposed to be naturally connected through a loop neural network, which is integrated into the convolutional network to form an end-to-end trainable model. This allows CTPN to explore rich image context information and detect blurry text. CTPN works reliably on multiple scales and multiple languages without further follow-up processing. It departs from the previous method that required multiple steps of filtering from the bottom up.
However, CTPN has an obvious disadvantage. It is not good for non-horizontal text detection. In this paper, the text in the detection result picture is horizontal [25,26]. In order to solve the shortcomings of poor performance of CTPN in non-horizontal detection, a study introduced a detection approach that can detect text at any angle, which is generally called, in this approach, Segment Linking (SegLink) [24]. The main idea is to decompose the text into two locally detectable elements, that is, segments and links. For example, according to Figure 1, the first picture is a box of segments detected in the picture; the second picture is a line with links detected between adjacent segments; the third picture merges the whole words into segments connected by links [24].
A work regarding detecting text in Natural Image was proposed [24]. This deep neural network is called Connectionist Text Proposal Network (CTPN), which can accurately locate lines of text in natural images. CTPN detects text lines directly in a series of finegrained text proposals in convolutional feature correspondence. In this paper, a vertical anchor mechanism was developed to jointly predict the position and text/non-text score of each fixed-width proposal, greatly improving. For positioning accuracy, the sequence is proposed to be naturally connected through a loop neural network, which is integrated into the convolutional network to form an end-to-end trainable model. This allows CTPN to explore rich image context information and detect blurry text. CTPN works reliably on multiple scales and multiple languages without further follow-up processing. It departs from the previous method that required multiple steps of filtering from the bottom up.
However, CTPN has an obvious disadvantage. It is not good for non-horizontal text detection. In this paper, the text in the detection result picture is horizontal [25,26]. In order to solve the shortcomings of poor performance of CTPN in non-horizontal detection, a study introduced a detection approach that can detect text at any angle, which is generally called, in this approach, Segment Linking (SegLink) [24]. The main idea is to decompose the text into two locally detectable elements, that is, segments and links. For example, according to Figure 1, the first picture is a box of segments detected in the picture; the second picture is a line with links detected between adjacent segments; the third picture merges the whole words into segments connected by links [24]. The characteristic of text detection is that the aspect ratio is particularly large or small, and there is usually a rotation angle. If four parameters (x, y, w, h) of target detection were used to specify a target position, the error obtained will obviously be too big, so, then, one lets the model learn another parameter θ, and this θ represents the rotation angle of the text box. The parameters returned from the original (x, y, w, h) to (x, y, w, h, θ), the error in solving the angle occurs. This approach incorporates both the idea of the CTPN smallscale candidate box and the idea of the Single Shot MultiBox Detector (SSD) approach, and achieves the effect of state-of-art of text detection in natural scenes. Then one analyzes the network architecture of SegLink to further understand how SegLink can achieve efficient multi-angle text detection. The SegLink architecture adopts the idea of SSD. First, Visual Geometry Group (VGG) 16 is used as the feature extraction for the backbone. The fully connected layers (fc6, fc7) of VGG are replaced by convolutional layers (conv6, conv7). One connects the conv layers conv8 to conv11. It is worth noting that the size between conv4-conv11 decreases in turn (each layer is one-half of the previous layer). This approach is for multiscale target detection, that is, large feature maps are good at detecting small objects. By using multiple feature maps of different scales and detecting segments and links from six feature layers, text lines of different sizes can be detected.
There is no mention of whether the approach can detect curved text [27]. The method of segmenting and integrating the complete text line first detects and then merges, which The characteristic of text detection is that the aspect ratio is particularly large or small, and there is usually a rotation angle. If four parameters (x, y, w, h) of target detection were used to specify a target position, the error obtained will obviously be too big, so, then, one lets the model learn another parameter θ, and this θ represents the rotation angle of the text box. The parameters returned from the original (x, y, w, h) to (x, y, w, h, θ), the error in solving the angle occurs. This approach incorporates both the idea of the CTPN small-scale candidate box and the idea of the Single Shot MultiBox Detector (SSD) approach, and achieves the effect of state-of-art of text detection in natural scenes. Then one analyzes the network architecture of SegLink to further understand how SegLink can achieve efficient multi-angle text detection. The SegLink architecture adopts the idea of SSD. First, Visual Geometry Group (VGG) 16 is used as the feature extraction for the backbone. The fully connected layers (fc6, fc7) of VGG are replaced by convolutional layers (conv6, conv7). One connects the conv layers conv8 to conv11. It is worth noting that the size between conv4-conv11 decreases in turn (each layer is one-half of the previous layer). This approach is for multiscale target detection, that is, large feature maps are good at detecting small objects. By using multiple feature maps of different scales and detecting segments and links from six feature layers, text lines of different sizes can be detected.
There is no mention of whether the approach can detect curved text [27]. The method of segmenting and integrating the complete text line first detects and then merges, which undoubtedly greatly increases the loss of text detection accuracy and time consumption. A study in 2017 proposed that efficient and accurate scene text can solve multi-angle text detection and is simple and powerful [25]. There are multiple stages of text detection; taking the region proposals detection approach as an example, it includes the stages of candidate box extraction, bounding box regression, and merging candidate box. There is (a) horizontal word detection and recognition pipeline proposed by [28], (b) multidirectional text detection pipeline [29], (c) horizontal text detection using CTPN, and (d) Efficient and Accurate Scene Text (EAST) pipeline, which eliminates most of the intermediate steps, consists of only two stages, and is much simpler than previous solutions. The author of EAST believes that splitting a text detection approach into multiple phases does not actually have many benefits, and it is correct to implement a true end-to-end text detection network. Therefore, the EAST process is quite concise, and is only divided into the Fully Convolutional Network (FCN) generation text line parameter stage and the local perception Non-Maximum Suppression (NMS) stage [30], which has further improved the accuracy and speed of text detection. To understand the advantages of EAST for text detection from the network architecture, EAST's network architecture is divided into three parts: feature extraction layer, feature fusion layer, and output layer. Feature extraction layer: Backbone uses deep but lightweight neural networks (PVANet) for feature extraction [31], and then sends it to the convolution layer, and the size of the subsequent volume base layer decreases sequentially (the size is half of the previous layer), and the number of convolution kernels increases sequentially (for the previous layer double). Feature maps at different stages are extracted so that feature maps of different scales can be obtained. The purpose is to solve the problem of intense text line scale transformation. Large-size layers can be used to predict small text lines, and small-size layers can be used to predict large text. Feature merging layer: merge the extracted features. The merging rule adopts the U-net method [32]. The merging rule is to merge the top features from the feature extraction network downward according to the corresponding rules [27].

Expert Interview
Having a comprehensive understanding for the interactive concepts, the follow-up steps are to determine what and how to develop the platform. It starts from the suggestions by expert interviews, which are set to explore the feasibility and know-how in the construction industry. Scholars suggested any number between 6 and 20 of experts with experience greater than 10 years in the target industry [33][34][35]. Based on convenient sampling method, interviews with 10 experts in different professional fields were conducted. The first part for expert interviews is to fill in the basic information of name, company/origination, job title, and service years in the construction industry. The second part is the interview for 40-60 min individually based on the questions derivative from the summary of Section 2: • Does your company currently have a management system related to project contracts? • How do you think that a management system influences your company? • If there were a change order occurring to anyone of your construction projects, would it make the contract stakeholder(s) not to precisely perform cost control? • If an interactive platform for stakeholders of a construction contract can be established and carried out, coordination and communication between parties should be improved, benefiting the project. Would you be willing to introduce it? Any other concerns?
The summary from the expertise can be recapitulated as follows: a common problem in the construction industry today is the inequality of information and the lack of a platform to integrate information. It can be seen that interviews show that most of the respondents are willing to introduce interactive platform(s) for stakeholders to benefit engineering project contracts. In addition to ensure information sharing, their expertise also bring about the establishment of a complete resume for buildings, the realization of evidence preservation in combination with the documentary management, intellectual property rights, risk identification, early warnings for overdue progress, budget, and schedule. Therefore, they recommend a robust system that automatically processes documentary management by contract keyword detection.

Mechanism for Contract Keyword Detection Approach
Integrating the suggestions from the literature review and expert interviews, the proposed approach is to adopt natural language processing technology to build an interactive platform for engineering project contract stakeholders. To do so, based on pre-determined keywords by users and applications introduced in the literature review, we designed a mechanism that integrates Progressive Scale Expansion Network (PSENet), Convolutional Recurrent Neural Network (CRNN), and Bi-directional Recurrent Neutral Networks Convolutional Recurrent Neural Net-work (BRNN-CNN) toolboxes to extract the key words for contracts shown in Figure 2. The dotted box in Figure 2 is the proposed approach, Contract Keyword Detection (CKD), where there are three steps to accomplish keyword contract extraction inclusive of text detection, text recognition, and interpretation. Starting from the inputs that may be gathered in figure or photo format, PSENet detects all possible text location and then positions them in order to yield bounding boxes where PSENet is a built toolbox that contains Chinese text with a databank of ICDAR 2017 Reading Chinese Text In the Wild (RCTW) with a total of 12,263 images, of which 8034 were used as the training set and 4229 were used as the test set [24]. In this step, shape robust text detection was used as a method for scene text recognition [31]. This method is compared with other scene text detection methods and the quadrilateral bounding box is not required. It can accurately detect text instances with any shape, and its accuracy is higher than that of other methods. In this stage of the experiment, a real lease contract was prepared, and the PSENet method to detect the text part of the contract was run, shown in Appendix A. PSENet accurately determined the location of the text with a success rate of nearly 95%. Extracting crops, which comprise text information in the bounding boxes, CRNN converts them into texts in step 2. The final step is to finalize the text meanings using the BRNN-CNN tool and to determine the matching keywords by referring to the term-banks. The outputs (detected contract keywords) become the "bridges" connecting the interactive platforms among parties.
dence preservation in combination with the documentary management, intellectual property rights, risk identification, early warnings for overdue progress, budget, and schedule. Therefore, they recommend a robust system that automatically processes documentary management by contract keyword detection.

Mechanism for Contract Keyword Detection Approach
Integrating the suggestions from the literature review and expert interviews, the proposed approach is to adopt natural language processing technology to build an interactive platform for engineering project contract stakeholders. To do so, based on pre-determined keywords by users and applications introduced in the literature review, we designed a mechanism that integrates Progressive Scale Expansion Network (PSENet), Convolutional Recurrent Neural Network (CRNN), and Bi-directional Recurrent Neutral Networks Convolutional Recurrent Neural Net-work (BRNN-CNN) toolboxes to extract the key words for contracts shown in Figure 2. The dotted box in Figure 2 is the proposed approach, Contract Keyword Detection (CKD), where there are three steps to accomplish keyword contract extraction inclusive of text detection, text recognition, and interpretation. Starting from the inputs that may be gathered in figure or photo format, PSENet detects all possible text location and then positions them in order to yield bounding boxes where PSENet is a built toolbox that contains Chinese text with a databank of ICDAR 2017 Reading Chinese Text In the Wild (RCTW) with a total of 12,263 images, of which 8034 were used as the training set and 4229 were used as the test set [24]. In this step, shape robust text detection was used as a method for scene text recognition [31]. This method is compared with other scene text detection methods and the quadrilateral bounding box is not required. It can accurately detect text instances with any shape, and its accuracy is higher than that of other methods. In this stage of the experiment, a real lease contract was prepared, and the PSENet method to detect the text part of the contract was run, shown in Appendix A. PSENet accurately determined the location of the text with a success rate of nearly 95%. Extracting crops, which comprise text information in the bounding boxes, CRNN converts them into texts in step 2. The final step is to finalize the text meanings using the BRNN-CNN tool and to determine the matching keywords by referring to the term-banks. The outputs (detected contract keywords) become the "bridges" connecting the interactive platforms among parties.

Implementation and Discussion
Considering that the CKD approach is constructed using PSENet, CRNN, and BRNN-CNN toolboxes, the settings remain as suggested from the original work [24,31]. The implementation for the proposed approach also involves three phases: (1) the text position detected by PSENet in the first stage was taken, afterwards a screenshot was taken of the text in sections, and it was sent to the second stage for text recognition. The original contract is detected by the first stage as shown in Figure 3.

Implementation and Discussion
Considering that the CKD approach is constructed using PSENet, CRNN, and BRNN-CNN toolboxes, the settings remain as suggested from the original work [24,31]. The implementation for the proposed approach also involves three phases: (1) the text position detected by PSENet in the first stage was taken, afterwards a screenshot was taken of the text in sections, and it was sent to the second stage for text recognition. The original contract is detected by the first stage as shown in Figure 3.        (2) Having the results from Figure 4, the contract keyword detection approach converts the segmented text image file into the code and compile it. Then, the output text is presented as a txt file as shown in Figure 5. It can be seen in the output where errors may occur but the text can be roughly distinguished without affecting processing. (2) Having the results from Figure 4, the contract keyword detection approach converts the segmented text image file into the code and compile it. Then, the output text is presented as a txt file as shown in Figure 5. It can be seen in the output where errors may occur but the text can be roughly distinguished without affecting processing. (3) Another environment, rather than the English spoken, is the final stage to test the proposed system. The CKD approach is designed to be carried out in different languages, as long as the databank supports it. USA is the first experimental environment where construction contract parties may have language usage different from what we have read in official documents. The process follows the above-mentioned steps to have the model established. Therefore, in this stage, CKD is to send the recognized text to the word segmentation and entity recognition system developed by the Central Academy of Sciences for part-of-speech analyses, BRNN-CNN [36]. The output result of word segmentation recognition is shown in Figure 6, and the part of speech represented by English speech tags is shown in Table 1.  (3) Another environment, rather than the English spoken, is the final stage to test the proposed system. The CKD approach is designed to be carried out in different languages, as long as the databank supports it. USA is the first experimental environment where construction contract parties may have language usage different from what we have read in official documents. The process follows the above-mentioned steps to have the model established. Therefore, in this stage, CKD is to send the recognized text to the word segmentation and entity recognition system developed by the Central Academy of Sciences for part-of-speech analyses, BRNN-CNN [36]. The output result of word segmentation recognition is shown in Figure 6, and the part of speech represented by English speech tags is shown in Table 1. For example, in Figure 6, referring to the results from the first sentence: "Party A and Party B agree to conclude this contract hereinafter referred to as this contract since Party A and Party B lease a circle to build a cloud management system.", <Cause> Cbb is displayed in the system, and its meaning is a relational connective; <A> and <B> are displayed as Neu, representing numerals or definite words; <System> is displayed as Na, representing common nouns; VC stands for action and transitive verb. From the above results, it can be seen that the CKD approach can recognize the part of speech represented by the text and its accuracy rate is high. The result of the second system output is entity identification, and the parts of speech represented by it are described in detail in Table 1. As shown in Figure 7 (the CKD output), from the output result, <first>, <second>, and <third> are expressed as ordinal, which represents the ordinal number; <AB> is expressed as person; <six hours> expressed as time,  For example, in Figure 6, referring to the results from the first sentence: "Party A and Party B agree to conclude this contract hereinafter referred to as this contract since Party A and Party B lease a circle to build a cloud management system.", <Cause> Cbb is displayed in the system, and its meaning is a relational connective; <A> and <B> are displayed as Neu, representing numerals or definite words; <System> is displayed as Na, representing common nouns; VC stands for action and transitive verb. From the above results, it can be seen that the CKD approach can recognize the part of speech represented by the text and its accuracy rate is high.
The result of the second system output is entity identification, and the parts of speech represented by it are described in detail in Table 1. As shown in Figure 7 (the CKD output), from the output result, <first>, <second>, and <third> are expressed as ordinal, which represents the ordinal number; <AB> is expressed as person; <six hours> expressed as time, representative of the mean time. Using the technology of named entity identification, the part of speech it represents can be found and the required keywords can be identified by part of speech. Information can be found quickly, reducing search time in a large amount of data.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 12 representative of the mean time. Using the technology of named entity identification, the part of speech it represents can be found and the required keywords can be identified by part of speech. Information can be found quickly, reducing search time in a large amount of data. The study presents notable implication for practicing engineering managers, which lie in the interactive platform for construction contract stakeholders, which is developed for more in-depth management, facilitates interactive communication, and, thus, reduces human errors, improves the accuracy in document processing, and enhances party relationships. Human errors can be viewed in personal and systematic ways. Although flaws or errors occur sometimes, the CKD approach is a countermeasure where it cuts the middleman, shaving work hours required to revise document processing, and allow errors to be rectified. To gain competitive advantage, it is important for engineering managers to utilize their resources efficiently and embrace innovation potential. Consequentially, the proposed approach has proven significant in project coordination between all parties by, for example, sharing real-time information, rapidly dealing with uncertainties and changes, automatically processing routine tasks, and saving manpower and costs.

Conclusions
The study integrates techniques associated with the natural language processing technology to develop the CKD approach that involves PSENet, CRNN, and BRNN-CNN toolboxes to deal with interactive connections among parties for construction projects. The CKD mechanism is original and effective in construction practice that facilitates contract management especially in documentary handling for mega projects. It serves not only as The study presents notable implication for practicing engineering managers, which lie in the interactive platform for construction contract stakeholders, which is developed for more in-depth management, facilitates interactive communication, and, thus, reduces human errors, improves the accuracy in document processing, and enhances party relationships. Human errors can be viewed in personal and systematic ways. Although flaws or errors occur sometimes, the CKD approach is a countermeasure where it cuts the middleman, shaving work hours required to revise document processing, and allow errors to be rectified. To gain competitive advantage, it is important for engineering managers to utilize their resources efficiently and embrace innovation potential. Consequentially, the proposed approach has proven significant in project coordination between all parties by, for example, sharing real-time information, rapidly dealing with uncertainties and changes, automatically processing routine tasks, and saving manpower and costs.

Conclusions
The study integrates techniques associated with the natural language processing technology to develop the CKD approach that involves PSENet, CRNN, and BRNN-CNN toolboxes to deal with interactive connections among parties for construction projects. The CKD mechanism is original and effective in construction practice that facilitates contract management especially in documentary handling for mega projects. It serves not only as an interactive platform that automatically and real-time connects, tracks, and handles document tasks but also as a countermeasure to prevent personal or systematic human errors since what the contract document management CKD can provide is text extraction, recognition, and interpretation free of human interaction. It is efficient to handle massive document and tasks flows while a contraction project is ongoing. As a result, significant manpower is reduced. The research contributions lie on both academic and practical domains. For academic domains, the CKD mechanism integrating PSENet, CRNN, and BRNN-CNN approaches to cope with real-time massive document flows is novel in the construction industry. For practice, the proposed approach brings significant reduction for manpower and human error, an alternative for settling down misunderstanding or disputes due to real-time and precise communication, and a solution for efficient documentary management. It connects all contract stakeholders proficiently. The follow-up studies can focus on new toolboxes that may replace the current methods in an efficient way. Other languages than Chinese are recommended to test, to see if the mechanism works efficiently. Using the CKD mechanism as the core, practitioners may develop related applications to facilitate specific domains for construction projects such as financing, dispute management, quality control, and scheduling mapping.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the sponsored investigation.

Conflicts of Interest:
The authors declare no conflict of interest.