Evaluation of the Nomological Validity of Cognitive, Emotional, and Behavioral Factors for the Measurement of Developer Experience

: Background: Developer experience should be considered a key factor from the beginning of the use of development platform, but it has not been received much attention in literature. Research Goals: The present study aimed to identify and validate the sub-constructs and item measures in the evaluation of developer experience toward the use of a deep learning platform. Research Methods: A Delphi study as well as a series of statistical methodologies including the assessment of data normality, common method bias, and exploratory and conﬁrmatory factor analysis were utilized to determine the reliability and validity of a measurement model proposed in the present work. Results: The results indicate that the measurement model proposed in this work successfully ensures the nomological validity of the three second-order constructs of cognitive, affective, and behavioral components to explain the second-order construct of developer experience at p < 0.5 Conclusions: The measurement instrument developed from the current work should be used to measure the developer experience during the use of a deep learning platform. Implication: The results of the current work provide important insights into the academia and practitioners for the understanding of developer experience.


Study Background and Purpose
Developer experience (DX) refers to the overall experience of developers while they develop systems, products, or services. The concept also includes the developer's feelings, motivations, characteristics, and activities [1]. DX is a special case of user experience (UX), which has been studied extensively. DX shares both the idea and philosophy of the UX design. However, the main actor of DX is dualistic-the developer is a tool user who enables the system use and a producer who develops the tool [2]. DX may have a similar meaning to UX, but the former is limited only to developers responsible for designing or developing the system for the end-user. The developer should have interest in and passion for the values of the application created for the user and should motivate interest in its development.
In contrast with the past, current developers are considered key participants or stakeholders in various business environments. Furthermore, because they are decision-makers, much attention has been paid to the developer's experience [3]. For example, the success of the Apple App Store and Google Play-which transformed the mobile market ecosystem-was due to the vast development ecosystem or DX in which developers can actively develop and register apps with interest.

Research Goal
This study aims to derive sub-factors and assessment questions to evaluate DX perceived by developers who use a deep-learning (DL) platform. DL refers to a set of machine-

Research Goal
This study aims to derive sub-factors and assessment questions to evaluate DX perceived by developers who use a deep-learning (DL) platform. DL refers to a set of machine-learning (ML) algorithms that attempt high-level abstraction by combining various non-linear transformation techniques [4]. It also refers to the framework that assists in developing a DL model faster. DL platforms, which can implement DL technology, are expected to receive significant attention from developers considering the advancement potential of related industries with DL-based technologies. It is essential to use a DL-based platform to develop voice recognition, pattern recognition, and computer vision systems based on DL technology, which is considered the core technology of the Fourth Industrial Revolution.
For example, major information technology companies including Netflix, Uber, and Airbnb, use DL platforms to analyze big data collected from consumers [5]. Developers must implement DX at a high level to provide UX at a high level for software users [6]. If DX can be evaluated using a systematic methodology, it can help improve development tools and environments and provide users with excellent UX [7]. This study extracted the sub-factors and assessment questions that can monitor DX as a benchmark tool to evaluate the performance of the DL platform. It also evaluated the nomological validity concerning whether the benchmark tool can explain DX (the single concept) at the level of statistical significance. Figure 1 illustrates the research flow for the present work.

Developer Experience (DX)
DX refers to the experience involving the interaction between development tools and developers in the software development process [8]. A complete understanding of DX can facilitate an understanding of the expectation, perceptions, and feelings of developers who participate in development tools. Furthermore, DX has a dualistic nature of UX-based DX, in which the developer is both a system tool user and a system producer who predicts the UX [2]. Understanding the relationship between the developer and the platform that a developer uses is essential because we can thereby predict whether the platform can satisfy developers and ensure usability and functionality. Fontão et al. (2017) [9] claimed that factors that affected DX could be identified from three types of information. They included the factor that affected the developer's cognition toward the software development infrastructure, the developer's perceptions or emotions of the contribution to the ecosystem, and the value recognized by the developer's Step 1 Step 2 Step 3

Development of preliminary measurement items from literature review
Delphi study for the assessment of content validity

Survey study for the collection of item measures
Step 4

Evaluation of reliability and validity for high-order constructs and item measures
• Data normality, common method bias, internal consistency • Exploratory factor analysis • Confirmatory factor analysis -construct validity, discriminant validity -Nomological validity -Model fit

Developer Experience (DX)
DX refers to the experience involving the interaction between development tools and developers in the software development process [8]. A complete understanding of DX can facilitate an understanding of the expectation, perceptions, and feelings of developers who participate in development tools. Furthermore, DX has a dualistic nature of UX-based DX, in which the developer is both a system tool user and a system producer who predicts the UX [2]. Understanding the relationship between the developer and the platform that a developer uses is essential because we can thereby predict whether the platform can satisfy developers and ensure usability and functionality. Fontão et al. (2017) [9] claimed that factors that affected DX could be identified from three types of information. They included the factor that affected the developer's cognition toward the software development infrastructure, the developer's perceptions or emotions of the contribution to the ecosystem, and the value recognized by the developer's contribution. They claimed that if the effect on DX was analyzed from these three viewpoints, it could help to maintain organizational strategy and raise the quality of the goal pursued by the organization and the framework that supported developers [9].
Parviainen et al. (2015) [10] argued that because software developments were conducted through collaboration with team members, the DX perceived by team members would affect the integrated development environment. They classified the factors that consisted of DX into development rules and policies, dynamic work division according to circumstantial roles, and collaboration mode between communities. For example, they suggested that DX may change depending on whether team members who developed software agreed on the coding style. Furthermore, they discussed whether assigning a clear role for software development (e.g., manager, user interface developer, and backend developer) affected the perceived experience.

Deep-Learning (DL) Platform
The demand for ML-based artificial intelligence (AI) has increased. ML is an algorithm that self-learns from data and can execute work without relying on a programmed code [11]. The ML technique has been established as an essential tool to process complex data and predict patterns in all sectors, including medicine, finance, environment, military, and science [12,13] DL refers to an ML technique that performs tasks including image classification, voice recognition, and natural language processing [14,15].
A DL platform is a development environment provided to developers to build a DL model. Because potential business success can be obtained using DL techniques in various areas where logic and inference, prediction, or cognition function processing are required, investment in and development of the DL platform have increased, predominantly in global companies and academia [16]. The use of DL technology has increased in various sectors. Thus, case studies on developing and improving DL platforms toward developing and supporting new algorithms have also increased [17].
For example, studies on DL platform development combined with parallel processing technology, cloud computing power, and distributed storage technology have been conducted in recent years [18,19]. Nonetheless, no studies have been conducted on the experience of developers who use DL platforms concerning platform processes or tools or an evaluation of those experiences. A DL platform should provide the perception to developers that it can create unique value and user-friendly platform functions. Developers should fully understand the value of the DL platform and perceive that the deliverables provided by the platform are critical to value creation and the tool's value in improving productivity.

Sub-Constructs of DX
Fagerholm and Münch (2012) [1] proposed a conceptual framework to evaluate DX ( Figure 2). They showed that because software development requires creative characteristics, developers specifically consider infrastructure and perspectives on work, feeling, and value created when achieving goals. The DX framework proposed by the researchers consisted of cognition, affection, and intention or conation aspects of experience. Cognition includes attention, memory, and problem-solving ability. Affection includes the developer's feelings or emotions, such as positive emotion or pleasure. Finally, intention, or conation, includes impulse, motivation, and desire required for software development. Inspired by the work of Fagerholm and Münch et al. (2012), the present study explores the research question of "three sub-constructs of cognitive, affective, and conative factor consist of developer experience toward the DL platform".

Development of Preliminary Survey Questionnaire
Based on the studies by Ahn and Back (2018) [20], Back and Parks (2003) [21], Chowdhury and Salam (2015) [22], Khanal (2018) [23], Venkatesh, Speier, and Morris (2002) [24], and Venkatesh and Davis (2000) [25], this study constructed a preliminary questionnaire to evaluate the sub-constructs of DX. The operational definition of "cognition" in this study refers to the rational basis providing competitiveness, efficacy, value, or motivation for developers to use the DL platform. "Affection" is an emotional state that developers expect from the use of the DL platform. Finally, "conation" refers to the developer's will or desire to use the DL platform autonomously under volitional control.

Development of Preliminary Survey Questionnaire
Based on the studies by Ahn and Back (2018) [20], Back and Parks (2003) [21], Chowdhury and Salam (2015) [22], Khanal (2018) [23], Venkatesh, Speier, and Morris (2002) [24], and Venkatesh and Davis (2000) [25], this study constructed a preliminary questionnaire to evaluate the sub-constructs of DX. The operational definition of "cognition" in this study refers to the rational basis providing competitiveness, efficacy, value, or motivation for developers to use the DL platform. "Affection" is an emotional state that developers expect from the use of the DL platform. Finally, "conation" refers to the developer's will or desire to use the DL platform autonomously under volitional control.

Delphi Survey
This study conducted a Delphi survey with a panel of experts to evaluate the content validity of the preliminary survey questionnaire [26]. The Delphi survey was conducted between 4 January and 23 January 2021. Lynn (1986) [27] suggested that 3 to 10 expert panelists are sufficient for evaluating content validity. In this study, four experts who have been in academia and industries related to DL technology and development for at least 10 years participated in the panel. Of the four experts participated in the Delphi, two respondents were classified as having a master's degree, and the remaining two as having a doctoral degree. In addition, they consist of one four-year university faculty, one principal researcher in a research institute, and two project managers with 15 years of industry and practice experience.
The Delphi evaluation tool was constructed based on closed-ended questions using a 7-point Likert scale (e.g., one point = "strongly disagree", and seven points = "strongly agree") and open-ended questions. The closed-ended questions were created to evaluate whether the preliminary survey questionnaire could assess the sub-constructs that corresponded to each question. Furthermore, free opinions about the suitability, understanding, and consistency of the terms used in the questions were collected through open-ended questions in addition to the question's appropriateness. In the present work, the oneround Delphi study was conducted by email.
The content validity ratio (CVR) of the response values to the closed-end questions was calculated to verify the degree of agreement of the expert opinion obtained from the Delphi survey. According to Lawshe (1975) [28], the minimum acceptable CVR was 0.99

Delphi Survey
This study conducted a Delphi survey with a panel of experts to evaluate the content validity of the preliminary survey questionnaire [26]. The Delphi survey was conducted between 4 January and 23 January 2021. Lynn (1986) [27] suggested that 3 to 10 expert panelists are sufficient for evaluating content validity. In this study, four experts who have been in academia and industries related to DL technology and development for at least 10 years participated in the panel. Of the four experts participated in the Delphi, two respondents were classified as having a master's degree, and the remaining two as having a doctoral degree. In addition, they consist of one four-year university faculty, one principal researcher in a research institute, and two project managers with 15 years of industry and practice experience.
The Delphi evaluation tool was constructed based on closed-ended questions using a 7-point Likert scale (e.g., one point = "strongly disagree", and seven points = "strongly agree") and open-ended questions. The closed-ended questions were created to evaluate whether the preliminary survey questionnaire could assess the sub-constructs that corresponded to each question. Furthermore, free opinions about the suitability, understanding, and consistency of the terms used in the questions were collected through open-ended questions in addition to the question's appropriateness. In the present work, the one-round Delphi study was conducted by email.
The content validity ratio (CVR) of the response values to the closed-end questions was calculated to verify the degree of agreement of the expert opinion obtained from the Delphi survey. According to Lawshe (1975) [28], the minimum acceptable CVR was 0.99 if the number of panelists was five. Thus, this study selected only the preliminary survey questions whose CVR was 1.0 to develop the questionnaire for the main survey. The excluded questions were not used in the main survey and final analysis. As illustrated in Table 1, for instance, the calculated CVRs for the measurement items of 1, 4, 6, 7, 11 through 14, 20, and 21 are less than 1.0, and these ten items were discarded for the final analysis. With respect to the ten pilot items for affective factors, calculated CVRs for the items of 1 through 4, 9, and 10 are less than 1.0 and again, these six measurement items were eliminated and not included in the final analysis (Table 2). Among the calculated CVRs for the measurement items of behavioral factors, the values for the items of 4 and 6 have CVRs less than 1.0 (Table 3). Thus, a total of nine measurement items for behavioral factors are considered for the final analysis. 9 It provides developers with high-quality value for their resources. 1 10 It provides developers with high-quality value for their technology. 1 11 It provides developers with a variety of values for information. 0 12 It provides developers with a variety of values for their resources. 0. 5 13 It provides developers with a variety of values for technology. 0 14 The price of the platform is reasonable. 0. 5 15 The platform is fast. 1

16
The interaction between the platform and the developer is clear. 1 Venkatesh, Speier, and Morris (2002) [24] 17 The interaction between the platform and the developer does not require much mental effort. 1

18
It is easy for developers to use the platform. 1 19 It is easy to do what the developer wants to do. 1 20 It evolves the work the developer wants to do. 0. 5 21 It increases the productivity of the developer's development work. 0. 5 22 It improves the efficiency of the developer's development work. 1 23 It increases the usefulness of developer's development work. 1  Developers think that even if there is a cost to use the platform, it will not significantly affect the use of the platform. 0.5 7 Developers have a clear idea that they will use the platform again. The developer of superior flat form without the use of force developers had the idea that you can use with the free will of its platform 1

Study Subjects
For the main survey in this study, a questionnaire was distributed to 260 employees in the industry, including S and L companies in South Korea. Their job description involves DL technology planning, design, and development. In detail, S and L companies are characterized by information technology organization that primarily provides software solutions and service integration service. Their business is highly driven by the use of an AI-based natural language and learning model, smart factory, blockchain technology, and cloud data analysis. The survey was conducted based on an online survey tool from 8 February to 26 March 2021. Insincere responses were excluded, and survey data from 225 employees were used in the final analysis. Table 4 presents the demographic characteristics of the respondents who evaluated the data used in the final analysis. The proportions of male and female respondents were 79.1% and 20.9%, respectively. The proportion of respondents in their 30s was 53.8 (the largest), followed by those in their 20s (35.1%) and 40s (10.7%). The respondents who graduated from four-year university programs accounted for 84.4%, while those who held Master's and Ph.D. degrees were 10.7% and 4.9%, respectively.  Table 5 presents the calculated mean, standard deviation, skewness, and kurtosis for the collected questionnaire values. For the structural equation model analysis based on the maximum likelihood estimation using IBM ® SPSS ® Amos, the collected data pattern should have a normal distribution characteristic [29]. In this study, the skewness and kurtosis of each survey question value were calculated to evaluate the normality of the survey data [30]. The absolute values of the calculated skewness and kurtosis were less than 3.0 and 10.0, respectively-the minimum acceptable ranges proposed by Kline (2011) [31]. Consequently, the collected survey data did not violate normality.

Exploratory Factor Analysis Results
An exploratory factor analysis was performed to verify the latent variable structure for the collected survey questionnaire values. For the fitness test used to perform the factor analysis, Bartlett's test for sphericity (χ 2 = 4296.664 (325), p = 0.000 < 0.05) and the Kaiser-Meyer-Olkin (KMO) measure value (=0.946 > 0.6) were calculated. All values satisfied the minimum acceptable score proposed by Kaiser (1974). Direct oblimin and principal component analysis were used for factor analysis rotation and factor extraction, respectively, to perform the exploratory factor analysis. The execution results demonstrated that the factor rotation converged in 14 iterations, and three components whose initial eigenvalues were greater than 1.0 were extracted ( Table 6). The minimum acceptable score in the factor loading was set to 0.5 in this study [32]. The factor loadings for C1, C9, C12, and C13-survey questions in which cognition was includedwere less than 0.5. Accordingly, these four survey questions and their values were excluded from the final analysis.

Internal Consistency Evaluation Results
Cronbach's alpha coefficients were calculated to verify the internal consistency of the evaluation tool designed to measure the three sub-constructs. As presented in Table 6, all of the coefficients were greater than 0.7-the minimum acceptable value proposed by Cortina (1993) [33] and DeVellis (2003) [12].

Confirmatory Factor Analysis Results
A confirmatory factor analysis was performed to evaluate the construct validity, model fit, and nomological validity of the measurement proposed in this study. Convergent validity and discriminant validity were assessed to evaluate construct validity.

Convergent Validity Evaluation Results
The guideline proposed by Fornell and Larcker (1981) [34] was used to evaluate the convergent validity of the measurement. Their guideline suggested that all the standardized factor loadings of the calculated observed variables should be greater than 0.6 at a statistical significance level of p < 0.05, and the average variance extracted (AVE) and composite reliability values should be greater than 0.5 and 0.7, respectively. The calculated standardized factor loadings were observed. B8 (=0.599) and B9 (=0.500), which were used to measure the conation sub-construct, did not satisfy the minimum acceptable score. Furthermore, the root mean square error of approximation (RMSEA) value, one of the goodness of fit indices calculated to evaluate the model fit, was 0.086, which did not satisfy the acceptable score of 0.08 or less [35]. Thus, this study improved the model fit of the measurement by observing the modification indices of the model fit obtained through the confirmatory factor analysis after removing the two observed variables of B8 and B9.
The covariance modification index of the observed variables C4 and C5, C6 and C7, and C6 and C10 were all 20.0 or greater. Thus, covariance was additionally set for each of them [35]. Table 7 displays the confirmatory factor analysis results for the modified measurement. The standardized factor loading of each observed variable was 0.6 or greater, and the AVE and composite reliability values were greater than 0.5 and 0.7, respectively. Thus, the measurement of this study satisfied the appropriate convergent validity.

Discriminant Validity Evaluation Results
The discriminant validity of the measurement was evaluated based on whether the AVE square root of each construct was larger than the estimated correlation coefficient between constructs based on the guideline proposed by Fornell and Larcker (1981) [34].
As presented in Table 8, the AVE square root of the cognition sub-construct was 0.787, which was greater than the estimated correlation coefficient (=0.678) between the cognition and affection sub-constructs and the correlation coefficient (=0.754) between the cognition and conation sub-constructs. The AVE square root of the affection sub-construct was 0.778, which was greater than the correlation coefficient 0.709 between the affection and conation sub-constructs. Thus, the discriminant validity of the measurement satisfied the criterion proposed by Fornell and Larcker (1981) [34].

Model Fit Evaluation Results
In evaluating the model fit of the measurement (Table 9) he chi-square (χ 2 ) statistics and normalized fit index (NFI), comparison fit index (CFI), Tucker-Lewis index (TLI), incremental fit index (IFI), and RMSEA were calculated. All indices satisfied the minimum acceptable values proposed in previous literature [32,36,37].

Nomological Validity Evaluation Results
The nomological validity was measured to evaluate whether the first-order subconstructs (cognition, affection, and conation) could construct the second-order construct (DX) at a statistical significance level [38]. The confirmatory factor analysis results in Table 7 confirm that the standardized factor loadings of three first-order constructs were all greater than 0.6 at the significance level p < 0.05 (cognition = 0.849; affection = 0.799; and conation = 0.888) (Figure 3). Thus, the nomological validity verified that the three first-order constructs can be constructed as the second-order construct. incremental fit index (IFI), and RMSEA were calculated. All indices satisfied the minimum acceptable values proposed in previous literature [36,37,32]. The nomological validity was measured to evaluate whether the first-order sub-constructs (cognition, affection, and conation) could construct the second-order construct (DX) at a statistical significance level [38]. The confirmatory factor analysis results in Table  7 confirm that the standardized factor loadings of three first-order constructs were all greater than 0.6 at the significance level p < 0.05 (cognition = 0.849; affection = 0.799; and conation = 0.888) (Figure 3). Thus, the nomological validity verified that the three firstorder constructs can be constructed as the second-order construct.

Results Summary
The results of this study verified three factors-cognition, affection, and conationas critical sub-constructs that can construct the DX of a DL platform at a statistically significant level. Based on a confirmatory factor analysis, the calculated standardized factor loadings of three sub-constructs were observed. The results revealed that the factor

Results Summary
The results of this study verified three factors-cognition, affection, and conationas critical sub-constructs that can construct the DX of a DL platform at a statistically significant level. Based on a confirmatory factor analysis, the calculated standardized factor loadings of three sub-constructs were observed. The results revealed that the factor loadings of cognition, affection, and conation were 0.849, 0.799, and 0.888, respectively. The standardized factor loading represents the correlation between latent constructs and observed variables. Each of the three sub-constructs had a high level of relatively uniform correlation with DX, a latent construct.
Based on the results of this study, the DX of a DL platform can be a construct that can be evaluated based on the developer's competitiveness and value perceived from the platform, affection (expressed positively or negatively), and willingness to use the platform.

Conclusions
A DL platform is a development environment provided to developers that allows them to build a DL model. Since business success can potentially be obtained using DL techniques in various areas where logic, inference, prediction, or cognition function processing are required, the DL platform has seen an increase in investments and development, predominantly in global companies and academia [17]. The use of DL technology has increased across various sectors. However, no studies have been conducted on the experience of the developers who use DL platforms, in terms of evaluating their experiences with platform processes or tools.
In terms of needing to evaluate and research the experiences of developers, who are the main users of the deep learning platform, this study has interdisciplinary significance by proposing a reflective model for evaluating DX when using a DL platform. Furthermore, the evaluation is verified at a statistically significant level. Future research should use external validity to verify whether the derived evaluation questionnaire and the subconstructs proposed in this study can explain the DX of other developer platforms, in addition to the DL platform.
It should be acknowledged that the present work did not consider any potential confounding factors or profiles including the type of deep learning platform respondents use in their companies or level of their work experience into the data analysis. These might impact the interpretation of our findings. Future work should replicate our findings with controlling for confounding variables to provide the robust support for the nomological validity of the measurement model proposed in the present work.
Author Contributions: Conceptualization, methodology, formal analysis, data curation, writingoriginal draft preparation: H.L., writing-review and editing, and funding acquisition: Y.P. All authors have read and agreed to the published version of the manuscript.
Funding: The publication of this paper has been partly supported by Graduate School of Techno, Kookmin University.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Not applicable.