Psychometric Properties of Heavy Work Investment Measures: A Systematic Review

: In recent years, the study of heavy work investment (HWI) has been diversifying greatly in the various ﬁelds of application in the organizational ﬁeld, for example, occupational health, human resources, quality at work among others. However, to date, no systematic review has been carried out to examine the methodological quality of the instruments designed to measure HWI. Therefore, the present systematic review examines the psychometric properties of three main measures of HWI: Workaholism Battery (WorkBAT), Work Addiction Risk Test (WART), and Dutch Work Addiction Scale (DUWAS). Five electronic databases were systematically searched, selecting psychometric articles. Of the 2621 articles identiﬁed, 35 articles met all inclusion criteria published between 1992 and 2019. The ﬁndings indicated that most of the articles were focused on reviewing psychometric properties, analyses were conducted from classical test theory, collected validity evidence based on internal structure and relationship with other variables, and reliability of scores was obtained through the internal consistency method. Of the instruments reviewed, the DUWAS is the one with the highest methodological quality. Recommendations are made for future research to address the psychometric study of these instruments based on recent advances in the ﬁeld of organizational measurement.


Introduction
Heavy work investment (HWI) is a concept developed by Snir and Harpaz [1] who define it as the time and effort invested in work as the two central axes of HWI. Based on Weiner's attributional framework they propose three main categories of possible predictors of HWI: background (e.g., gender, parenthood, education), internal (e.g., work addiction, workaholism), and external variables (e.g., financial needs, employer demands) [2]. Thus, the present review focuses on the possible internal predictors, i.e., work addiction and workaholism. Current theoretical proposals of HWI use the job demands-resources (JD-R) model, considering this construct as a continuum consisting of three main parts: antecedents, dimensions, and outcomes [3].
HWI is a construct that underlies many others such as work addiction, work engagement, passion for work, workaholism, among others [4]. That is, HWI is seen as a higher-level umbrella construct and that work addiction, work engagement, passion for work, and workaholism would be at a lower-level, subsumed by HWI [3]. The time invested in work consists of the use of a large number of hours, and work intensity is the effort involved in the various tasks that a person performs at the physical and intellectual level. Regarding internal predictors, work addiction presents compulsive behaviors due to the pressure received by workers [5]. Therefore, within work addiction, heavy investment of time is marked. It is important to note that HWI is not in itself a negative or positive construct, but that this will depend on the context in which it develops. Thus, a positive implication of this variable is work commitment, where the worker puts dedication and strength into his work activities [6].
Several studies have explored the relationship between workaholism and work engagement, showing significant associations [7], although with varying results, both positively and negatively [2]. In this sense, HWI has been related to greater job security, better career development opportunities, and higher salary. In parallel, HWI may also be negatively related to individual affective and behavioral outcomes that are to be avoided [8,9]. For example, workaholics tend to be associated with poorer mental and physical health, emotional and cognitive exhaustion, poor sleep habits, cardiovascular problems, poor social relationships, and work-life conflicts [10][11][12]. Thus, it is important to have tools for a correct assessment of the internal factors of HWI (workaholism or work addiction). Workaholism or workaholism includes one of the components of HWI, the heavy time investment in work, however, it does not necessarily include the other component of HWI, the intensity of work during that period [1]. Thus, workaholism or workaholism is related to one aspect of HWI.
In this regard, this review will focus on assessing the quality of three of the most widely used HWI measures reported in the literature: Workaholism Battery (WorkBAT), Work Addiction Risk Test (WART), and Dutch Work Addiction Scale (DUWAS) [13]. In addition, the definitions of standards for educational and psychological tests will be used to ensure consistency with current psychometric guidelines for instrumental studies. To date, there has not been a published study that systematically reviews research on the psychometric properties of HWI measures. This is a significant gap given the increased interest in research beyond the organizational setting. The validity of research findings is contingent on the use of tools with appropriate validity and reliability evidence that measure the constructs of interest. Therefore, the evaluation of psychometrically robust instruments through a systematic review is warranted.
The concept of validity has undergone a long evolutionary process up to the present day. In the 1985 Standards for Educational and Psychological Testing and Manuals, it is maintained that validity is a unitary concept and that the validity of a test is construct validity [14]. In the 2014 Standards, it is insisted that the term test validity is inapplicable, and therefore that of types of validity; rather, it is necessary to provide information from various sources about the purpose for which the test or measurement instrument was developed. It is reiterated that the test is not validated, but rather the inferences made from the scores of the subjects for a given purpose [15].
From this perspective, the test creator is not only responsible for the validity of the test but also the test user. Furthermore, the validity of a test is not established once and for all but is a continuous process of gathering evidence. From a scientific point of view, the only admissible validity is construct validity [16]. Therefore, the logic that underlies it and the methods used to determine it are those that correspond in general to the scientific method. Its evidence comes from various sources, which presuppose a clear definition of the construct and its dimensions or facets if these are necessary.
It is necessary to keep in mind that no study tests or validates a complete theory only concerning some of the inferences that can be derived from it. For constructs, negative results can be interpreted in three directions: the test may not measure the construct; the theoretical framework may be flawed, making it possible for incorrect inferences to be made; or the study design would not allow for proper testing of the hypotheses. These interpretations communicate poor psychometric and research theoretical training and practice, which can lead to ambiguous interpretation of negative results. Finally, it should always be kept in mind that unexpected relationships, as well as predicted ones, are part of the nomological network of the construct and provide arguments for the meaning of the scores.
Conversely, reliability is the degree to which test scores for a group of subjects are consistent in repeated applications of a measurement procedure and hence the reliability and consistency of a subject's score is inferred [17]. It is implicit in this definition that test scores are obtained on different occasions under the same conditions as administration and scoring; it also follows that reliability refers to the precision of the measurement, regardless of what the test measures, and that the reliability of the scores is relative since it is subject to the characteristics of the group of subjects in which it is estimated [18].

Objectives
The objectives of the present review are to: (1) to systematically identify three of the main measures of HWI in the literature, (2) to evaluate the psychometric properties presented in the reviewed studies, and (3) to determine whether there is a gold standard measure HWI.

Study Design
The present review followed the official PRISMA guidelines [19] on data identification, collection, and analysis. In this sense, the article is a systematic review of instrumental studies on three of the most commonly used measures of HWI: Workaholism Battery (Work-BAT), Work Addiction Risk Test (WART), and Dutch Work Addiction Scale (DUWAS) [13].

Literature Search
First, five selection criteria were determined: (1) instrumental or psychometric studies [20], where the main objective of the article was the creation, adaptation, or psychometric analysis of an instrument measuring HWI or any of its components (Workaholism or Work addiction), specifically, the WorkBAT, WART, and DUWAS; (2) conducted in any country; (3) published up to 2020; (4) peer-reviewed articles; and (5) published in the Spanish or English language.
Three groups of keywords and phrases were used to search for potential articles. The first group is related to psychometric analysis (psychometric, validation, validity, reliability, adaptation, and dimensionality). The second group is related to HWI (heavy work investment, workaholism, work addiction, passion to work, job demands, work craving, work engagement, addiction to work, passion towards work passion for work, and heavy-work investment). The third group is related to self-report measures (questionnaire, measure, assessment, tool, instrument, scale, inventory, and battery). The final search was performed in all databases on 17 July 2021. All articles published up to 2020 were selected. The

Database Search Strategy
Scopus TITLE-ABS-KEY (psychometric* OR validation OR validity OR reliability OR adaptation OR dimensionality) AND TITLE-ABS-KEY ("heavy work investment" OR workaholism OR "work addiction" OR "passion to work" OR "job demands" OR "work craving" OR "work engagement" OR "addiction to work" OR "passion towards work" OR "passion for work" OR "heavy-work investment") AND TITLE-ABS-KEY (questionnaire OR measure * OR assessment OR tool OR instrument OR scale OR inventory OR battery) AND PUBYEAR < 2021 AND (LIMIT-TO (DOCTYPE, "ar")) AND (LIMIT-TO (LANGUAGE, "English") OR LIMIT-TO (LANGUAGE, "Spanish"))

Database Search Strategy
Web of Science TS = (psychometric* OR validation OR validity OR reliability OR adaptation OR dimensionality) AND TS = ("heavy work investment" OR workaholism OR "work addiction" OR "passion to work" OR "job demands" OR "work craving" OR "work engagement" OR "addiction to work" OR "passion towards work" OR "passion for work" OR "heavy-work investment") AND TS = (questionnaire OR measure* OR assessment OR tool OR instrument OR scale OR inventory OR battery) Refined By: NOT Publication Years: 2021; Document Types: Articles; Languages: English or Spanish PsycNET (Any Field: psychometric* OR Any Field: validation OR Any Field: validity OR Any Field: reliability OR Any Field: adaptation OR Any Field: dimensionality) AND (Any Field: "heavy work investment" OR Any Field: workaholism OR Any Field: "work addiction" OR Any Field: "passion to work" OR Any Field: "job demands" OR Any Field: "work craving" OR Any Field: "work engagement" OR Any Field: "addiction to work" OR Any Field: "passion towards work" OR Any Field: "passion for work" OR Any Field: "heavy-work investment") AND (Any Field: questionnaire OR Any Field: measure* OR Any Field: assessment OR Any Field: tool OR Any Field: instrument OR Any Field: scale OR Any Field: inventory OR Any Field: battery) AND Document Type: Journal Article AND Year: 0 To 2020 Psychology and Behavioral Sciences Collection via EBSCO (psychometric* OR validation OR validity OR reliability OR adaptation OR dimensionality) AND ("heavy work investment" OR workaholism OR "work addiction" OR "passion to work" OR "job demands" OR "work craving" OR "work engagement" OR "addiction to work" OR "passion towards work" OR "passion for work" OR "heavy-work investment") AND (questionnaire OR measure* OR assessment OR tool OR instrument OR scale OR inventory OR battery) limit year 2020

MEDLINE via Ovid
(psychometric* OR validation OR validity OR reliability OR adaptation OR dimensionality) AND ("heavy work investment" OR workaholism OR "work addiction" OR "passion to work" OR "job demands" OR "work craving" OR "work engagement" OR "addiction to work" OR "passion towards work" OR "passion for work" OR "heavy-work investment") AND (questionnaire OR measure* OR assessment OR tool OR instrument OR scale OR inventory OR battery) limit 1 to year = "1860-2020"

Selection Process
The search results were initially screened by title and abstract to exclude research that did not meet the selection criteria. Subsequently, the remaining studies were retrieved from the different databases and the full articles were evaluated according to their relevance to meet the stipulated selection criteria.
After excluding 795 duplicate searches and 1773 studies that did not meet the selection criteria (because they were qualitative studies, empirical studies, were in a language other than Spanish or English, or were psychometric but worked with other instruments), 53 articles were selected from the review of the title and abstract, and subsequently acquired the 50 articles that were reviewed in their entirety, obtaining a final sample of 35 articles containing 37 studies (one of the articles analyzed the three instruments of interest independently). The flow diagram of the selection of the preceding studies is shown in Figure 1.

Coding Process
The coding was carried out considering the following information from the items analyzed: (a) instrument name, authors, and year; (b) study design; (c) number of items in the instrument; (d) dimensions it measures; (e) number of response options; (f) sample size; (g) participant characteristics; (h) mean and standard deviation of participant age; (i) country of origin of the sample; (j) psychometric theory used for the analyses; (k) validity evidence collected; and (l) reliability evidence collected.

Literature Analysis
After selection and coding of the articles, 35 articles met all the requirements. However, one of the articles had three studies from each of the objective instruments (WorkBAT, WART and DUWAS). Therefore, 37 studies were compiled in the database for further analy-sis. The characteristics of each study were organized in Table S1 (Supplementary Materials). Based on the structured information, the results were elaborated to methodologically describe the three main measures of HWI.

Coding Process
The coding was carried out considering the following information from the items analyzed: (a) instrument name, authors, and year; (b) study design; (c) number of items in the instrument; (d) dimensions it measures; (e) number of response options; (f) sample size; (g) participant characteristics; (h) mean and standard deviation of participant age; (i) country of origin of the sample; (j) psychometric theory used for the analyses; (k) validity evidence collected; and (l) reliability evidence collected.

Literature Analysis
After selection and coding of the articles, 35 articles met all the requirements. However, one of the articles had three studies from each of the objective instruments (Work-BAT, WART and DUWAS). Therefore, 37 studies were compiled in the database for further analysis. The characteristics of each study were organized in Table S1 (Supplementary Materials). Based on the structured information, the results were elaborated to methodologically describe the three main measures of HWI.

Characteristics of the Studies
Of the 37 studies analyzed (Table 2), 13 belong to WorkBAT, 12 correspond to WART and 12 worked with DUWAS. The studies were published between 1992 and 2019. Regarding the study design of the articles reviewed, most of them (n = 22) focused on reviewing the psychometric properties (validity and reliability) of the WorkBAT, WART, and DUWAS. Likewise, a significant number of adaptations were reported (n = 12), where they sought to adjust the original tests to different cultural contexts. Finally, three studies were found where the tests that are the main focus of this review were developed or constructed [21][22][23].

Internal structure
The two-factor (no work involvement) and three-factor (CFA) models had an acceptable fit.
Relations to other variables Drive and Work enjoyment were correlated with work stress, burnout and subjective health complaints. Work enjoyment was also correlated with work engagement components.

CTT
Internal structure Three-factor model (CFA), good fit and factor loadings between 0.36 and 0.95. Relations to other variables WART15-PBV correlated with the DUWAS (r = 0.90) and the correlations between the factors of both tests were greater than 0.50. WART15-PBV also correlated with general health perception (r = 0.29).

CTT
Internal structure Related two-factor model (CFA), acceptable fit, factor loadings between 0.38 and 0.77, in addition, the correlation between factors was 0.76. Relations to other variables Self-reports and peer-reports of workaholism (UWES answered by spouse, boyfriend, girlfriend, friend, or colleague) correlated; Workaholism (r = 0.52), WE (r = 0.50), and WC (r = 0.43). Workaholism and its scales showed positive correlations with overcommitment, the actual number of hours worked per week, burnout (emotional exhaustion), work engagement (Absorption), and intrinsic aspects of the job.

CTT
Internal structure Related two-factor model (CFA), an acceptable fit in both samples. However, a second-order model showed a better fit (WE: working frantically and working long hours; WC: obsessive work drive and unease if not working) with loadings greater than 0.50. Likewise, a four-factor related model (with the first-order factors of the previous model) also indicated a good fit. Factorial invariance: Second-order factor structure showed reasonable measurement invariance and stability of factor structure across the two samples and time in the Finnish sub-sample of managers with two measurement points two years apart. Internal consistence Addiction to work (α = 0.86).  Psychometric properties Workaholism (Working excessively (WE) and Working compulsively (WC)).

RMT
Internal structure The parallel analysis of the residuals shows that the work engagement has two significant components. However, the eigenvalues are below the cut-off point (2) for both factors.
Internal consistence Person separation reliability R, which scored 0. 49 in the WE and 0.56 in the WC. Regarding the number of items, it was observed that the psychometric tests analyzed varied in number due to different factors: (1) they were short versions; (2) items were eliminated due to methodological problems; or (3) they only studied some factors of the instrument. Conversely, the dimensions measured by the tests also changed from one study to another, since the structures were not fully replicated in different samples. Nevertheless, the DUWAS was the one that most of the time showed a two-factor structure (nine studies). In terms of the number of response options, the DUWAS (from "almost never" to "almost always") and the WART (from "never true" to "always true") presented almost the same four-point Likert scale across studies. The WorkBAT was usually answered on a five-point Likert scale (from "strongly agree" to "strongly disagree").
As for the sample size, in most cases it did not exceed 500 participants, which is necessary to obtain more robust and stable results, especially considering the diversity of structures presented by these tests. Furthermore, in some studies, the sample was larger than 1000 people. The participants in the studies presented diverse characteristics, mainly workers from different economic sectors (e.g., manufacturing, retail, service, education, or medical). Moreover, the age of the participants was generally between 30 and 50 years old. Finally, among the countries where these instruments have been most tested are the United States, Canada, Norway, the Netherlands, Italy, and Brazil, as well as others located in Asia and Oceania. No studies were found in Latin America or Africa.

Measurement Theory
The measurement theory that predominated in the studies was the Classical Test Theory (CTT) and only one study used the Rasch Measurement Theory (RMT) [54]. No study used the Item Response Theory (IRT). In this regard, it is important to note that some studies did not use a specific theory, because their objective was to collect evidence of validity and they did not focus on the analysis of items or the reliability of the scores [37,38,40]. CTT proposes a linear relationship between the observed score, the true score, and the measurement error, with the main limitation being the dependence of the persons and the test [55]. Conversely, the IRT seeks to explain the relationship between a person's ability and the probability of answering an item correctly, considering various characteristics such as difficulty, discrimination, or pseudo-guessing [56]. The IRT is considered a descriptive model since it tries to explain as much of the variance as possible [57]. Finally, the RMT is a prescriptive model, in which it is of interest to know whether the data fit the measurement model and, like the IRT, it focuses on the individual analysis of the items based on the interaction between an item and a person [58].

Validity
Regarding the collection of validity evidence, the studies analyzed focused mainly on two aspects: the internal structure and the relationship with other variables. However, two studies collected evidence based on test content. This source of validity evidence refers to the degree to which the content (domain definition, domain representation, domain relevance, and appropriateness of construction processes) of a test is congruent with the measurement objectives [59].

Validity Evidence Based on Test Content
Two studies collected this source of evidence based on the presentation of the items and their classification by a group of students or experts in the field, a method different from the one usually used to collect this evidence of validity, which consists of a group of judges giving their assessment of the items according to certain features (representativeness, relevance or clarity). However, the method used in these studies has the characteristic of analyzing the representativeness of the items [37,38]. Both studies belonged to the WART.

Validity Evidence Based on Internal Structure
The three tests studied are multidimensional, so this source of evidence is one of the most widely used to corroborate how the instruments are structured according to their items. The techniques used for this purpose were mainly three: principal components analysis (PCA), exploratory factor analysis (EFA), and confirmatory factor analysis (CFA). CFA was usually employed in studies reviewing psychometric properties as the first option, but, in cases where the fit was poor, a less restrictive method, such as PCA or EFA, was chosen. Of the three psychometric instruments, the DUWAS replicated its two-factor structure the most often. Thus, in nine studies, it was observed that the DUWAS was made up of the factors Working excessively and Working compulsively. Regarding the WorkBAT, a three-factor structure (Work involvement, Work enjoyment, and Drive) was observed in eight studies, although a two-factor structure (Work enjoyment and Drive) was also found in four studies. Finally, the WART showed a unidimensional structure in six studies, while four studies reported a five-factor structure (Compulsive tendencies, Control, Impaired communication, Self-worth, and Inability to delegate).
Within this validity, evidence is also the factorial invariance analysis that was employed by some studies to compare whether the structure of the instrument was invariant concerning the country of origin of the participants [23,53]. Factor invariance was performed through multi-group confirmatory factor analysis (MG-CFA), although in some cases the description of the procedure and results was not clear.

Validity Evidence Based on Relations to Other Variables
Since HWI is a relatively recent construct, it is necessary to form a nomological network to determine with which other variables it is related and with which it diverges. In this sense, this source of validity evidence made it possible to relate the total and factor scores of the WorkBAT, WART, and DUWAS to other variables such as Job stress, Job involvement, Time on the job, Overtime worked, Job satisfaction, and other variables linked to the organizational field. In most cases, simple correlation coefficients or simple linear regressions were used, with few studies analyzing the relationship between variables from a more advanced perspective such as structural equation models.

Reliability
The reliability of the scores was mostly evaluated by analyzing the internal consistency of the items and using the alpha coefficient. However, in no case were the assumptions that this reliability estimator needs to be used (unidimensional, absence of correlated errors, and tau-equivalence of the measurement model) corroborated. In addition, only one study reported confidence intervals for coefficient alpha. The DUWAS was the instrument that most often presented good levels of reliability (greater than 0.70). However, in some studies, it was found to be below this criterion. The WorkBAT and the WART had greater reliability problems in some of their dimensions, probably due to the number of factors (usually four) and the number of items that composed them.
In contrast, some studies evaluated reliability based on the temporal stability or test-retest method, which is necessary to determine whether test scores are constant over time. In most cases, the relationship between scores at time 1 and time 2 was low, less than 0.70 [28]. However, this could be influenced by the use of the simple correlation coefficient used to assess the agreement between the two measures. Only two studies used appropriate statistics for this purpose, such as the intraclass correlation coefficient [13] and Lin's coefficient of concordance [44].

Discussion
The present review aimed to (1) systematically identify three of the main measures of HWI in the literature, (2) evaluate the psychometric properties presented in the reviewed studies, and (3) determine whether there is a gold standard measure of HWI. Regarding the first objective, three measurement instruments were analyzed, the Workaholism Battery (WorkBAT), Work Addiction Risk Test (WART), and Dutch Work Addiction Scale (DUWAS), which are the main measures with which various studies approach the phenomenon of HWI. Regarding the second objective, the empirical evidence of the instruments was systematized concerning the validity and reliability of their scores. Finally, regarding the third objective, although the DUWAS is the test that shows the best performance, because its two-factor factor structure was replicated and presented better levels of reliability in most studies, it is slightly superior to the WorkBAT and the WART, so it cannot be concluded that it is the most optimal measure for the measurement of HWI. Thus, further studies with these instruments, based on more robust methodologies, are needed.
About the measurement theory used, the tests have not been tested through the Item Response Theory (IRT) or different models of the Rasch Measurement Theory (RMT), so the performance of the items and the test in people with different levels of HWI is not known. That is, through these models, it is possible to identify at what level of trait of the person, the instruments are more or less reliable since from the CTT reliability is only estimated globally. Moreover, from these two perspectives, it is possible to construct computerized adaptive tests, where workers do not always have to answer the same items and the same number of items, but where the items are adapted to their responses and their trait levels.
Regarding validity, it is necessary to provide evidence of the content of the items. The two studies that were found, and that collected this evidence are more than 20 years old, so the context in which they were evaluated is different from the current one. Likewise, two additional sources of validity have not been studied to date. The evidence of validity is based on response processes, which would allow us to know to what emotions, feelings, thoughts, memories, and behaviors of those evaluated associate when they read the items of the instruments. In this way, we could approximate an explanation that the items are indeed assessing aspects of HWI. In addition, the last source of evidence that has not been considered is that based on the consequences of the application of the test and that involves aspects that go beyond the assessment, such as the actions that could be taken after learning the results in a particular group.
Regarding the reliability of the measures, the results were very variable, finding in many cases values below the minimum acceptable (0.70). However, it is important to note that in no case were the alpha coefficient assumptions corroborated, so there is a possibility that they are underestimated or distorted. Likewise, as in other statistical techniques, it is necessary to consider the nature of the items being worked on, since the WorkBAT, WART, and DUWAS have a Likert-type scale, whose level of measurement is ordinal. Therefore, reliability estimators should be calculated based on a polychoric matrix, which allows a coherent evaluation of the internal consistency of the scores. Conversely, temporal stability has been used in several studies, although methodologically, the best coefficient was not used to see if the scores changed substantially from one assessment to another. In almost all cases, a correlation coefficient was used, which is responsible for seeing whether the scores would be found to be related. However, in these cases, it is suggested to use a coefficient that assesses the concordance (coincidence) of the scores.
Currently, there are systematic reviews of measures of different variables relevant to the organizational field, for example, stakeholder engagement [60] or human resource management systems [61]. However, as mentioned at the beginning of the review, this is the first work that systematically evaluates the psychometric properties of HWI measures. Nevertheless, the study by Andreassen et al. (2013) [13] empirically studied the psychometric properties of the WorkBAT, WART, and DUWAS, through a cross-sectional study where they analyzed the temporal stability and factorial structure of the aforementioned instruments. The study was conducted with 368 Norwegian workers and found that the correlations between the scales were low, showing low convergent evidence, weak temporal stability of scores, and an internal structure for the WorkBAT of four factors, as well as for the WART, while the DUWAS showed a two-factor factorial solution. The authors concluded that the three measures could be used interchangeably. This coincides with the findings of the present review since no outstanding performance of any of the measures was observed over the others.
The present review has some limitations. First, this review only synthesized those articles written in English or Spanish, and articles written in other languages were not involved. Further, three instruments measuring one side of HWI were analyzed, which, although they are not the main ones, are not the only ones, since other tests could also be observed in the selection of the final articles, although in a smaller proportion (for example, the Workaholism Analysis Questionnaire-WAQ or Bergen Work Addiction Scale-BWAS). Moreover, only articles published up to 2020 were analyzed. However, in 2021, one psychometric study has been published that present tools specific to the HWI considering Time Commitment and Work Intensity as dimensions [62]. Thus, the aforementioned limitations should be taken into consideration for future studies, where other instruments are analyzed, that can help to understand this construct that is in full development and boom in terms of research to have a clearer approximation of its functioning.
This systematic review has theoretical and practical implications related to HWI. At a theoretical level, the review of the three instruments allows us to know the different aspects of HWI, specifically workaholism and work addiction, exploring how they are theoretically related to other variables, such as job satisfaction and engagement. In this way, it allows the construction of a nomological network for HWI, where different organizational and mental health constructs are explored. However, the practical implications lie in knowing the strengths and weaknesses of the three measures analyzed to know which versions have the most empirical support and which are still to be studied.
Likewise, the article contributes to the HWI research line, especially in the current context of a health emergency, where teleworking conditions have made the line between personal and professional life increasingly thinner, or in some cases, disappearing. Therefore, it is necessary to have good measurement tools that can capture relevant details of HWI in workers who have high levels of time and effort investment. In this sense, the findings obtained suggest that the DUWAS, the WorkBAT, and the WART can be used to measure HWI. However, the DUWAS presents a slightly better psychometric performance than the other two scales, so it should be considered as a first option although the decision should be linked to other factors such as the objective of the measurement, the components to be explored, and the context of the application.
These findings suggest that HWI has consequences in the organizational field and applied settings (e.g., occupational health or human resources). Moreover, integral to theory development is the ability to differentiate a construct from its antecedents and outcomes. Therefore, developing a thorough understanding of the nature of the HWI phenomenon and its consequences requires the use of sound and appropriate psychometric tools to measure the construct.

Conclusions
The present systematic review constitutes the first effort to summarize the psychometric properties of three popular HWI measures (WorkBAT, WART, and DUWAS). In this way, the review provides empirical evidence of the performance of the three instruments so that researchers, psychologists, managers, and other interested parties can choose the measure that best suits their needs and objectives of use. According to the findings obtained, the three HWI measurement instruments have similar performance, so it is advisable to conduct additional psychometric studies focused on other sources of validity evidence, analysis of bias due to certain sociodemographic variables, or with more robust psychometric models (for example, from the IRT or RMT). Finally, based on this review, it is possible to propose new lines of research focused on covering under-explored or unexplored psychometric aspects of the WorkBAT, WART, and DUWAS.