Repetitive Model Refinement for Questionnaire Design Improvement in the Evaluation of Working Characteristics in Construction Enterprises

This paper presents an iterative confidence interval based parametric refinement approach for questionnaire design improvement in the evaluation of working characteristics in construction enterprises. This refinement approach utilizes the 95% confidence interval of the estimated parameters of the model to determine their statistical significance in a least-squares regression setting. If this confidence interval of particular parameters covers the zero value, it is statistically valid to remove such parameters from the model and their corresponding questions from the designed questionnaire. The remaining parameters repetitively undergo this sifting process until their statistical significance cannot be improved. This repetitive model refinement approach is implemented in efficient questionnaire design by using both linear series and Taylor series models to remove non-contributing questions while keeping significant questions that are contributive to the issues studied, i.e., employees’ work performance being explained by their work values and cadres’ organizational commitment being explained by their organizational management. Reducing the number of questions alleviates the respondent burden and reduces costs. The results show that the statistical significance of the sifted contributing questions is decreased with a total mean relative change of 49%, while the Taylor series model increases the R-squared value by 17% compared with the linear series model. OPEN ACCESS Sustainability 2015, 7 15180


Introduction
The questionnaire approach is widely used for surveying and collecting sample data with regard to an issue, with a list of questions to be answered and the results aggregated for statistical analysis.However, the main factors or questions influencing the findings of the models used need to be validated and simplified for efficient questionnaire design.In order to acquire accurate evaluations of working characteristics in construction enterprises and to alleviate problems of relatively large-dimensional and nonlinear models, this study develops a confidence interval based repetitive parametric model refinement approach for questionnaire design improvement.

General Information about the Questionnaires
A total of 250 questionnaires were distributed to Taiwanese and Chinese employees of two ranks in the company being studied.After excluding 30 invalid questionnaires (being incomplete or with missing values, or regarded as "outliers" through a set a mathematical analysis) and 39 unreturned ones, a total of 181 questionnaires were valid.The response rate was 72.4%.

Questionnaire Design Improvement
Questionnaire surveys are a widely used method to collect opinions and views.A customized questionnaire is developed based on the parameters revealed by context immersion in a given field (Kim [1]).However, many factors such as tedious design formats (Saris [2], Saris and Gallhofer [3]), redundant content, and excessive length (Weimiao and Zheng [4]) may lead to an inconsistent comparison matrix for the decision problem.Invalid or bad results from a questionnaire survey may cause decision makers to make faulty inferences (Ergu and Kou [5]).Suzuki et al. [6] introduced procedures to design reasonable questionnaires using statistical analysis to obtain high accuracy.Reducing the length of a survey by using a more streamlined set of questions can lead to more reasonable data being acquired and to better explanations of the issues in question.Other examples of this approach include Edwards et al. [7], who reduced the effective sample size and introduced bias.Finding ways to increase response rates to postal questionnaires would improve the quality of health research.Landsheer and Boeije [8] used qualitative facet analysis, an application of Guttmann's facet theory, to investigate whether item content sufficiently covered the intended subject area.This form of content analysis constitutes a systematic, effective, and critical tool for improving the content of questionnaires.Jacqui et al. [9] improved questionnaire design by enabling iterations of qualitative and quantitative testing, evaluation, and redevelopment.Results from such tests enable evidence-based decisions to be made regarding trade-offs between measurement error, processing error, non-response error, respondent burden, and costs.By enabling targeted improvements at the questionnaire design level according to specific needs, we can create valuable reference resources (Xu et al. [10]).

Model Refinement and Repetitive Computation
To alleviate problems of respondent burden and costs as well as relatively large-dimensional and nonlinear models, the issue of model refinement has increasingly drawn much attention in many fields.Smith [11] addressed the study of algorithms and system designs.Adrian [12] presented a refinement process with respect to data list building using model generators.Kapova and Goldschmidt [13] proposed model-driven application engineering based on the concept of analytical transformations.Liu [14] established two optimization models for a wireless optical communication system based on a four-level pulse amplitude modulation scheme.Ragnhild et al. [15] explored the behavior inheritance consistency of both refined and re-factored models with respect to the original model.Steven et al. [16] addressed model refinement as an iterative process.Zhuquan et al. [17] proposed that measurements permitted the repeated application of a system identification procedure operating on closed-loop data, together with successive refinements of the designed controller.

Nonlinear Models and Statistical Confidence Intervals
A nonlinear model is often adopted in system applications.Khorshid and Alfares [18] developed a parameter identification technique in creating a mathematical model of vehicle components by solving an inverse problem using a non-linear optimization method.Lin and Chen [19] proposed a statistical confidence interval based nonlinear parameter refinement approach and applied it to the standard power series model (Lin [20], Lin and Betti [21]) for the identification of structural systems.Other statistical confidence interval based studies include Tryon [22], who employed a graphical inference confidence interval approach in analyzing independent and dependent approaches for statistical difference, equivalence, replication, indeterminacy, and trivial difference.Yang et al. [23] proposed control limits based on the narrowest confidence interval to analyze problems, if the traditional three-sigma control limits or probability limits were adopted and some points with relatively high probability of occurrence were excluded; yet, some points with relatively small probability of occurrence may still be accepted in asymmetrical or multimodal distributions.Bonett and Price [24] proposed an adjusted Wald interval for paired binomial proportions that was shown to perform as well as the best available methods.In construction management, it has been shown to be feasible to use nonlinear models to deal with construction cost overruns (Ahiaga-Dagbui and Smith [25], Anastasopoulos et al. [26]) and schedule forecasting patterns (Kim and Kim [27], Patel and Jha [28]).

Prime Novelty Statement
In contrast with the conventional tests of reliability and validity, the designed questionnaires in this study were analyzed to identify the main factors and associated questions influencing the model studied using the proposed repetitive model refinement approach so as to streamline the number of questions in surveys of working characteristics in construction enterprises.Problems of respondent burden and costs as well as relatively large-dimensional and nonlinear models were thus alleviated.To reduce the number of questions with a more streamlined set, it was feasible to refine the model by repetitively removing non-contributing questions.Each time non-contributing questions were removed, the questionnaire model would be updated and rerun once again in a multiple regression setting.This model refinement approach for the content validity of the questionnaire was implemented using both linear and Taylor series models by conserving significant questions that were contributive to the issue being studied, i.e., employees' work performance explained by their work values and cadres' organizational commitment explained by their organizational management.The results have been verified by calculating the statistical significance values of the sifted contributing questions and the R-squared values of established models.

Questionnaires Evaluating Working Characteristics in Construction Enterprises
In this study, the research subjects of the questionnaires were the Taiwanese employees and cadres of Taiwan-based construction enterprises in China.Questionnaire findings of similarities and differences in work values, work satisfaction, organizational management, and organizational commitment were preliminarily reviewed.The effects of work values and organizational management on work satisfaction and organizational commitment, respectively, were analyzed using questionnaires based on the job diagnostic survey by Hackman and Oldham [29].The "working characteristics questionnaires" included questionnaires for (1) work values; (2) work performance and satisfaction; (3) organizational management; and (4) organizational commitment and identification (Lin and Shen [30], Shen [31]).

Repetitive Model Refinement Approach and Analyses
Questionnaire data were used in multiple regression analyses using four models, comprising the linear series, the refined linear series, the Taylor series, and the refined Taylor series model, where for the employees' part the independent variables are X = work values, which are used to explain the dependent variables Y = work performance and satisfaction; and for the cadres' part, X = organizational management, used to explain Y = organizational commitment and identification.
Two linear regression models were generated to identify the causal links between work values and work performance on the one hand, and organizational management and organizational commitment on the other.The original linear series model was refined through an iterative approach.This refined model was developed to streamline the questionnaire by removing non-contributing questions.The Taylor series model expanded the original linear series model up to the third moments.As a consequence, the R-squared value in the regression setting was increased.The refined Taylor series model was obtained from the original Taylor series model by the repetitive refinement approach in a regression setting.It was thus feasible to obtain the R-squared values of the regression between X and Y defined above and the mean relative change of the statistical significance as two indicators of result verification, so as to prove the accuracy of the refined model and to validate the sifted questions as genuinely significant contributors to the refined model.
The iterative refinement approach provides for the sifting of model components and related questions by repetitively using the 95% confidence interval in a regression setting.The 95% confidence interval is selected by convention and because the higher confidence interval enables more stringent selection of the components and thus a lower possibility of incorporating nonlinear elements, which is generally problematic for systems with a degree of nonlinear behavior; such nonlinearity will be verified in the results, showing the nonlinear Taylor series model significantly increases the R-squared value when compared with the linear series model.If the estimated confidence interval of a parameter contains the "null" (zero) value, it is statistically valid to remove such a parameter and its corresponding component, while maintaining those parameters whose confidence intervals do not cover the zero value.This component/question sifting process is repeated by rerunning the regression and refining the model until none of the estimated 95% confidence intervals of the remaining parameters cover the zero value (Lin and Chen [19]).In addition, the interval method proposed in this article has proved more reasonable than the mean value method.Using the interval method considers an interval which covers zero or not.However, using the mean value method to remove those close to zero values has a problem; i.e., what values are "close" to zero (e.g., 10 −10 , 10 −20 , or 10 −30 , etc.)?
The employees' section of the questionnaire data is used in this study to demonstrate the model refinement approach using 95% confidence intervals in a regression.Using question Ey1 ("I think my work ability is excellent") as an example to show the model refinement approach, we assign Y = Ey1 in the questionnaire for employees' work performance and satisfaction, while X = Ex1-24, being all 24 questions in the questionnaire for employees' work values.In other words, the question Ey1 is explained by the questions Ex1-24.The consequent repetitive sifting process to select the real contributing components/questions out of the 24 questions (Ex1-24) to Ey1 is listed in Tables 1-4 (adapted from Lin and Shen [30], Shen [31]).Each table presents the outcome of a new regression after the component sifting process.Each of the highlighted upper and lower bounds for a given component indicates that the 95% confidence interval covers the zero value in the regression analysis.
Removing those components/questions with 95% confidence intervals covering the zero value in the regression setting of Table 1 and rerunning a new regression of the remaining components leads to Table 2. Continuing this repetitive sifting process by rerunning the regression analysis for the remaining components in Table 2 we obtain Table 3.By the same component sifting process, Table 4 is derived from Table 3.The 95% confidence interval for each remaining component in Table 4 does not cover the zero value, implying that the remaining components are genuine contributing factors in explaining the component Ey1.Hence, it is statistically valid to stop the component sifting process at this point.It is noteworthy that the significance value of each remaining component from Table 2 to Table 4 decreases in average a new regression is conducted in the repetitive refinement approach.The removed components correspond to relatively high significance values while the remaining components correspond to successively declining significance values in each round of regression.

Statistical Significance of Question
The relative change of the statistical significance value before and after each round of the repetitive refinement approach in the regression setting is defined as: where f j x denotes the final statistical significance value for the jth component of the model, while i j x denotes the initial statistical significance value for the jth component of the model.The statistical significance is defined as follows: If the p-value is less than or equal to alpha, we say that the data are statistically significant at level alpha.In statistics (where "significant" means "corresponds to a real difference in fact") the term is used to indicate only that the evidence against the null hypothesis reaches the standard set by alpha (Moore and McCabe [32]).Since the lower the significance value of a component the higher will be its contribution to the model, a negative value for the relative change of the statistical significance in Equation ( 1) signifies that the effect of the corresponding component/question on the model is increased, while the opposite is true for the case of a positive value.Tables 5 and 6 list the relative change of the statistical significance as a percentage (%) for each question of Ey explained by Ex1-24 and for each question of Cy explained by Cx1-8, respectively.
Table 5. Employees' part: relative change of the statistical significance for each question of Ey explained by Ex1-24.

Work Satisfaction Work Values
In Table 5, a blank indicates that the question used to explain the corresponding question Ey in a model has been removed.All the questions used to explain the question Ey3 have been removed, implying that Ey3 ("My boss thinks I am doing a great job at work") has nothing to do with any of the questions relating Ex1-24.Such a question should be removed to improve questionnaire design for accurate evaluations of working characteristics.It is clear that all the significance values of the remaining questions are decreased except for the four marked values.Such a decrease in the significance value refers to the increase of the effect of the question on a model, verifying that the remaining questions are the real contributing questions/factors for the refined model.The total mean relative change of the statistical significance of the remaining variables is −45%.
Similarly in Table 6, a blank indicates that the question used to explain the corresponding question Cy in a model has been removed.Again, the significance values of the remaining questions are clearly decreased except for the two marked values.Such a decrease in the significance value verifies that the remaining questions are the real contributing questions/factors to the refined model.The total mean relative change of the statistical significance of the remaining variables is −52%.In particular, the question Cy7 "Staying and working for this company doesn't do me any good" needs to be explained by all eight questions Cx1-8 relating to organizational management.In other words, choosing whether to stay and work for the company depends on the entire range of the company's management strategies.

R-Squared Value of Regression Analysis
In the regression setting, the final R-squared value of each Ey for the employees' part through the repetitive refinement approach implemented in the linear series, refined linear series, Taylor series, and refined Taylor series models is listed in Table 7 (adapted from Lin and Shen [30], Shen [31]).The total mean R-squared value is decreased by 0.02 for the refined linear series model from the linear series model, signifying that the model refinement approach developed here cannot truly affect the R-squared value when searching for the genuinely contributory questions for survey improvement.On the other hand, the Taylor series model increases the mean R-squared value by 0.19 from the linear series model, which greatly improves the modeling process in the multiple regression setting.Similarly, the final R-squared value of each Cy for the cadres' part obtained by the repetitive refinement approach in the linear series, refined linear series, Taylor series, and refined Taylor series models is listed in Table 8 (adapted from Lin and Shen [30], Shen [31]).The total mean R-squared value is again decreased by 0.02 for the refined linear series model.The Taylor series model on average increases the R-squared value by 0.17 from the linear series model, greatly improving the modeling process.In Table 8, all the questions implemented in the Taylor series model achieve high R-squared values of greater than 0.85, implying a satisfactory result in modeling the causal explanations for questionnaire design.

Reliability and Validity
Verifications and error analyses were also conducted to compare the above results using the repetitive model refinement approach with those using methods of reliability and validity.
This study adopted Cronbach's alpha to represent the reliability in data analysis.Guieford [33] proposed a set of criteria for Cronbach's alpha.The standard value of Cronbach's alpha is 0.5.High alpha values (>0.7) mean high reliability while low ones (<0.35) mean low reliability.Table 9 shows that through the repetitive model refinement approach the number of questions was reduced and all the reliabilities were over 0.7, indicating that the sample was adequately stable and consistent.Table 9. Reliability analyses.

Before deleting questions After deleting questions Employees' work values
Cronbach's alpha = 0.623 Cronbach's alpha = 0.720 Employees' work performance and satisfaction Cronbach's alpha = 0.577 Cronbach's alpha = 0.742 Cadres' organizational management Cronbach's alpha = 0.565 Cronbach's alpha = 0.740 Cadres' organizational commitment and identification Cronbach's alpha = 0.590 Cronbach's alpha = 0.780 After repeatedly running the screening process of the estimated parameters, almost all the remaining questions of the model for both the employees' and cadres' sections show decreased significance values with a total mean relative change of 49%, verifying that the remaining questions are indeed the real contributing ones to the models studied.In particular, the question "My boss thinks I am doing a great job at work" in evaluating employees' work performance cannot be successfully explained by the contents of the questionnaire relating to employee work values.Such a question should instead be evaluated by a manager within the repetitive model refinement approach.However, the question "Staying and working for this company doesn't do me any good" can be evaluated through the full content of the questionnaire relating to organizational management.In other words, an employee's decision to stay in the company is substantially dependent on the company's management strategies.
Further, limitations of the study indicate that the developed questionnaire design improvement should be applied to data with high reliability.

Table 1 .
Multiple regression of original questionnaire model.

Table 2 .
Multiple regression of the refined questionnaire model in the first round.

Table 3 .
Multiple regression of the refined questionnaire model in the second round.

Table 4 .
Multiple regression of the refined questionnaire model in the third round.

Table 6 .
Cadres' part: relative change of the statistical significance for each question of Cy explained by Cx1-8.

Table 7 .
Employees' part: Final R-squared values for linear series, refined linear series, Taylor series, and refined Taylor series models.

Table 8 .
Cadres' part: Final R-squared values for linear series, refined linear series, Taylor series, and refined Taylor series models.