Research Online Research Online Toward a valid Measure of e-retailing service quality Toward a valid Measure of e-retailing service quality

Abstract E-retailers are major players in the field of electronic commerce and their success would seem to depend on service quality, because they are selling the same products that traditional retailers sell. This article critiques Collier and Bienstocks [5] new measure of e-retailing service quality and shows how the stages of e-retailing service quality can be more validly measured by adopting Rossiters [12] C-OAR-SE procedure for scale development. Collier and Bienstock`s measure is insufficiently valid because the measure (1) fails to specify the hierarchical objects that form the construct, and measures the overall object, e-retailing, wrongly by focusing on completed transactions; (2) does not fully acknowledge the hierarchy of attributes that form the construct and operationalizes these attributes wrongly as reflective when at all four levels they are formed; (3) inappropriately represents the rater entity by using college student participants; (4) employs unnecessarily numerous, often redundant, and sometimes ambiguous scale items, with Likert-type answer scales that make the observed scores managerially almost uninterpretable; and (5) tries to measure overall e-retailing service quality when it makes sense only to measure the separate quality ratings of sequential stages of the e-retailing service process. The article points out how these problems could be avoided by constructing a new measure that properly applies the C-OAR-SE procedure. Abstract E-retailers are major players in the field of electronic commerce and their success would seem to depend on service quality, because they are selling the same products that traditional retailers sell. This article critiques Collier and Bienstock’s [5] new measure of e-retailing service quality and shows how the stages of e-retailing service quality can be more validly measured by adopting Rossiter’s [12] C-OAR-SE procedure for scale development. Collier and Bienstock`s measure is insufficiently valid because the measure (1) fails to specify the hierarchical objects that form the construct, and measures the overall object, e-retailing, wrongly by focusing on completed transactions; (2) does not fully acknowledge the hierarchy of attributes that form the construct and operationalizes these attributes wrongly as “reflective” when at all four levels they are “formed”; (3) inappropriately represents the rater entity by using college student participants; (4) employs unnecessarily numerous, often redundant, and sometimes ambiguous scale items, with Likert-type answer scales that make the observed scores managerially almost uninterpretable; and (5) tries to measure overall e-retailing service quality when it makes sense only to measure the separate quality ratings of sequential stages of the e-retailing service process. The article points out how these problems could be avoided by constructing a new measure that properly applies the C-OAR-SE procedure.


Introduction
E-retailing is the largest marketing activity in the rapidly growing field of electronic commerce and, logically, perceived service quality would seem to be the key success factor that lifts this new form of retailing above traditional retailing because the products the two types of retailers sell are the same. However, e-retailing service quality must be validly measured if its actual role is to be demonstrated empirically. Moreover, a valid measure of service quality at each stage of e-retailing is required if e-retailers seek to improve their service quality.
It is therefore crucial that a valid instrument for measuring e-retailing service quality in its main stages be developed. The new instrument needs to be very highly content valid [12] because only then can it be used to test theory, such as whether e-retailing service quality really does cause repeat patronage, and only then can it be useful in practice, assuming that empirical research confirms the theory that it does cause repeat patronage, by showing e-retail managers precisely what components of service quality need to be changed, and by how much, to retain customers [14], [18] As well, the new scale, which will actually be several scales, needs to be as concise as possible without sacrificing coverage of the main components, because only a relatively concise scale will be used by practitioners and by those academics who can't afford the questionnaire space that a lengthy instrument requires.

Collier and Bienstock's New Measure
Collier and Bienstock [5] developed a new multiple-item measure of e-retailing service quality which they claim to be more valid than previous measures. Their instrument consists of 54 items. They claim that it has better content validity than the academic E-S-QUAL and E-RecS-QUAL scales developed by Parasuraman, Zeithaml, and Malhotra [10] and the practitioner scales used by several leading internet rating services such as Consumer Reports' E-Ratings, BizRate.com, and Worldbestwebsites.com. Collier and Bienstock's new instrument consists of three separate scales, compared with Parasuraman, Zeithaml, and Malhotra's two, measuring three hypothesized stages of e-retailing service. Despite referring to e-retailing service quality overall, they do not use the full scale, the 54 items, to compute a total e-retailing service quality score. The separate scales refer to sequential stages of eretailing, which they call process (which really refers only to the website visit), outcome (receipt of the delivered product), and recovery (a stage which occurs only if a complaint is made to the e-retailer).

C-OAR-SE Critique
The present article applies the C-OAR-SE procedure for scale development [12] to demonstrate that Collier and Bienstock's instrument for measuring e-retailing service quality [5] is not sufficiently content-valid and explains how a highly valid and briefer instrument can be developed. The C-OAR-SE procedure is also used to propose a more valid measure of Collier and Bienstock's ultimate dependent variable, future patronage intention. Rather ironically in light of the present critique, Collier and Bienstock attempted to go beyond the conventional scale development procedure [4] applied in most previous academic measures of service quality, notably the SERQUAL types of measures, by applying the C-OAR-SE procedure. However, as the present article explains, Collier and Bienstock in fact followed the conventional procedure and made the same mistakes as did the previous academic researchers.
C-OAR-SE (an acronym for Construct definition, Object classification, Attribute classification, Rater identification, Scale formation, and Enumeration) is a rational, expert judgment procedure, not an empirical, statistical one. It can be shown logically, using C-OAR-SE, independently of the empirical data they report, that Collier and Bienstock's measure is not sufficiently valid.
To develop a highly valid measure of e-retailing service quality, the full C-OAR-SE procedure should have been applied, in six steps, which form the headings of the present article: 1. Definition and measurement of the object of the construct.
2. Definition and measurement of the attribute of the construct.
3. Definition and sampling of the rater entity.
4. Content coverage of the formative items for the component attributes. 5. Content of the question part and the answer part of the items. 6. Enumeration (scoring) to improve predictive validity and managerial usefulness.

Stage Components
Collier and Bienstock [5], in their expansion of the construct, identify the likely stage components but they mistakenly describe these as components of the attribute, service quality, instead of components of the object, e-retailing. The stage components of e-retailing, renamed more accurately than Collier and Bienstock named them, are (1) the website visit; (2) the transaction with the e-retailer, if the visit culminates in a transaction or an attempted transaction; and (3) recovery, if a transaction has ensued and if the customer complains about a problem with delivery or with the merchandise or service purchased. Collier and Bienstock correctly argue that Parasuraman, Zeithaml, and Malhotra [10] unjustifiably collapse the first two stages into one, in E-S-QUAL. The present author contends that there are, in e-retailing, five potential stages: (1) the website visit; (2) the transaction attempt (which too often is aborted, as found by Sismeiro and Bucklin [15]); (3) the customer assistance phone contact (the usual response when an online transaction attempt fails); (4) the transaction outcome (which is the actual stage that Collier and Bienstock measured); and (5) recovery (if needed). However, the present critique will proceed as though Collier and Bienstock's three stages are adequate.

Constituents of the Object
Collier and Bienstock [5] sampled (and thus measured) the constituents of the object wrongly. They sampled as the constituents of the e-retailing object "the last e-retailer with which [each customer in the survey] enacted a transaction" (p. 267, emphasis added). Restriction of the sample to (completed) transactions must underrepresent the type of e-retailer whose site is primarily accessed for pre-purchase "information" and then the transaction is enacted offline via another direct response medium, such as going to a travel agent or telephoning an airline direct to purchase airline tickets after checking schedules and prices on airlines' websites. Another constituent type of eretailer underrepresented would be those companies whose websites are primarily accessed for "entertainment" even though you can purchase their products online. Also, providers of search engines and recommender systems are often an important constituent of e-retailing (Steckel, Winer, Bucklin, Dellaert, Dreze, Haubl, Jap, Little, Meyvis, Montgomery, and Rangaswamy [16]) and search engines are mentioned as a constituent by Collier and Bienstock in their conceptual discussion of e-retailing (p. 264) but their instrument has no item to represent them. The conceptual definition of the object e-retailing requires that the constituent types of e-retailers be specified and sampled.
Less than 2% of e-retail website visits result in a purchase [19]. Collier and Bienstock's sampling of completed transactions means the probable omission of e-retailers whose websites were of such poor "quality" that the visitor did not proceed to the transaction stage. This omission would have the effect of truncating (cutting off the bottom of) the distribution of website service quality (Collier and Bienstock's process quality) scores in their empirical results and making it less likely that website service quality would predict any further behavior such as whether, in fact, the transaction (Collier and Bienstock's outcome) stage occurs.

The Transaction Stage
Collier and Bienstock [5] also measured one of the component objects of the overall object wrongly. In their operational measure of the transaction stage object, some of the questions (items 27, 28, 32 and 33 in particular, in the Appendix to Collier and Bienstock's article and reproduced in the present Appendix, later) do not allow this to be the first transaction with the e-retailer. These questions instead assume that the consumer is already a repeat customer and so not all of the items for the transaction stage can be validly answered by a first-time customer. It may be noted that many of the fulfillment items in Parasuraman, Zeithaml, and Malhotra's [10]

Correct Procedure for Object Measurement
From the perspective of the C-OAR-SE procedure, the fourth-order object, e-retailing, should have been sampled from the experiences of customers, first-time and repeat, of a representative collection of constituent types of eretailers. The component objects, which are third-order and remain the same at the second-order level of objects, namely the website visit, the transaction, and recovery, should have been explicitly recognized as separate objects, rather than as part of separate attribute descriptions, so that experts could more readily decide whether these objects constitute the correct stages of e-retailing. In fact, these separate objects share the same attribute which is, of course, service quality. The nature and measurement of the attribute of the construct is considered next.

Overall Attribute
The overall attribute of the construct e-retailing service quality as perceived by college student customers (who are the rater entity, see section 3) is service quality. As realized by Collier and Bienstock [5] when they cited the C-OAR-SE article, perhaps helped by the article's discussion of how to classify service quality as an attribute, service quality is a higher-order, formed attribute. However, Collier and Bienstock consider service quality to be second-order formed whereas, just as was the case for e-retailing as an object, it is actually a fourth-order formed attribute. It is formed (made up of, potentially) by summing the scores from several third-order components. These third-order components differ according to the component object being rated, not the attribute, service quality which is the same for all three, and are designated in the present article as website service quality, transaction service quality, and recovery service quality.
Where the "formed" idea comes in, according to Collier and Bienstock, is that the attribute scores on these three third-order components could be summed to form the total service quality attribute score. However, the three component scores do not actually need to be summed. It only makes sense to do so in the case of customers who experience the next two stages of e-retailing after a website visit. Even then, the separate scores resulting from treating the stage quality measures as separate scales would be more diagnostic than a summed score. Collier and Bienstock do not sum them in their empirical study and therefore never actually apply service quality as a formed attribute despite calling it one.

Misapplication of Formed Attribute Theory
The conceptual error that Collier and Bienstock [5] make thereafter is to claim to follow formed-attribute theory as in the C-OAR-SE procedure, but to depart from it when operationalizing the measure. Perhaps without realizing it, they ended up following conventional theory, no differently from the previous researchers in service quality. They made two mistakes. The initial mistake was in the way they measured the 11 second-order component attributes. The second-order component attributes, renamed more accurately here with their objects, are website ease of use, website privacy, website design quality, website information trustworthiness, website functionality (all relating to the website object); condition of received order, timeliness of received order, accuracy of received order (all relating to the transaction object); and recovery personnel competence, recovery procedure quality, and recovery outcome fairness (all relating to the recovery object). Collier and Bienstock identified the 11 components a priori but then they took all of the items in their initial battery of items and threw them into a factor analysis, with orthogonal factor rotation. It is simply not believable that the same 11 hypothesized components would emerge as 11 orthogonal factors. If this is what they did, and their description of the statistical procedure (p. 268) is too vague to tell, the temptation to reidentify the components ex post from the factors must have been great. Alternatively, what they may have done is looked for three factors corresponding to the third-order stage quality components. This would be just as incorrect conceptually, for two reasons. One is that there is no reason why the stage quality factors should be uncorrelated (orthogonal) or even that they should be factors if formed-attribute theory (a component theory not a factor theory) is followed. The other reason is that the five, three, and three second-order components that make up (form) the three respective third-order stage components do not have to unidimensionally "load" on them. Factor analysis is a statistical procedure that ignores the conceptual requirement that the five, three, and three components form the respective stage quality attribute scores and the components of a formed attribute do not have to be, and indeed should not be, unidimensional (see [12], p. 315). Thus, Collier and Bienstock intended to follow formedattribute theory, as in C-OAR-SE, but in truth applied the conventional eliciting-attribute theory in deriving the 11 components, just like the previous researchers did for their components.
The second mistake made by Collier and Bienstock, and by the previous researchers, is to regard the 11 secondorder component attributes themselves as eliciting (warranting "reflective" indicators in conventional terms, see p. 267 of their article) and failing to realize that these, too, are formed attributes. For example, the second-order component object and attribute website ease of use is clearly made up of the ratings on its items and does not cause them as the reflective indicator specification adopted by Collier and Bienstock assumes. The items, in turn, in some cases redundantly, represent first-order components, each with its own object and attribute. In wrongly assuming that the 11 second-order component attributes were eliciting ("reflective"), Collier  This paper is Available online at www.jtaer.com conventional mistake (see [12], p. 315) of computing coefficient alpha and deleting potentially defining items. It may be noted that this same mistake was made in E-S-QUAL and E-RecS-QUAL earlier [10] and in this regard Collier and Bienstock's measure is no better. The "formative" versus "reflective" distinction is not just a minor difference of statistical treatment but rather a major conceptual choice that, if made wrongly, reduces the content validity of the entire measure.

Correct Procedure for Attribute Measurement
What should have been conceptualized and measured, according to C-OAR-SE theory, is a totally formed hierarchy: the first-order item scores form the second-order attribute component scores; the second-order scores form the thirdorder attribute component scores; and those scores, in turn, form the first-order overall service quality attribute score (if it is appropriate to use the overall sum score). The complex structure of the attribute and its components, as well as their object components, as they should have been conceptualized, is shown in Figure 1.

Rater Entity
The third essential element in the conceptual definition of a construct, according to C-OAR-SE, is the rater entity. Critics of the C-OAR-SE procedure have argued against the inclusion of the rater entity in the construct (see especially [6]) but the Collier and Bienstock study [5] shows why it is essential. The complete label for the construct that Collier and Bienstock measured is e-retailing service quality as perceived by college student customers. The rater entity, college student customers, makes it a different construct than if the rater entity were, for example, the eretail manager or an expert panel. If they were the same construct, then, logically, the measures of them would have to produce identical scores, regardless of the rater entity. This could happen empirically in a given dataset if all three rater entities happened to have the same perceptions, but there is no theoretical reason why they should. From the perspective of C-OAR-SE, consumers' quality ratings are one construct and managers' quality ratings and experts' quality ratings are others.

Wrong Sample of Raters
Collier and Bienstock [5] made an operational mistake when sampling the rater entity because they clearly intended the rater entity, the target population of raters, to be e-retail customers in general (their abstract and discussion imply this) but they settled for young adult e-retail customers and did not sample this narrower population properly. They chose a college student sample of raters because "the young adult population was the most active Web users [and] that sampling college students will allow us the best chance to represent the characteristics of online consumers" (p. 267). This choice is questionable because college students do not represent a random sample of young adults or even of frequent site-user young adults. In any case, Collier and Bienstock's justification for sampling young adults was that they are the most active web users, not necessarily the most active users of e-retailers. From the raterentity perspective of C-OAR-SE, Collier and Bienstock measured a different construct, namely e-retailing service quality as perceived by college student customers. Importantly, Collier and Bienstock's college student sample is not a broad enough sample of raters upon which to test theoretical relationships or make empirical generalizations about e-retailing service quality.

Correct Sampling of the Rater Entity
A random sample of all e-retail customers should be taken for the rater entity and, if the researchers wish to learn about e-retailing success factors, the sample should be stratified by non-customers (website visitors only), first-time customers, occasional customers, and frequent customers of the e-retailer.

Right Stages?
Collier and Bienstock [5] identified the third-order objects and attributes of website service quality, transaction service quality, and recovery service quality, recalling that these are renamed in the present article, "from the analysis of both academic and practitioner literature" (p. 263). One might question their apparent need to rely on the literature to identify the object components of the formed object e-retailing when common knowledge would have sufficed. Collier and Bienstock are undoubtedly themselves customers of e-retailers and introspection would have suggested that contact with e-retailers has at least three separate stages, the last two of which are only potential: interaction on the website, which may or may not culminate in a purchase transaction; the transaction itself, if there is a transaction made online; and, if a problem occurs with the outcome, and the customer reports it, there will be a recovery stage. Where Collier and Bienstock made a contribution was in distinguishing the three stages (although the present author believes they should have distinguished five -see section 2.2 earlier). Parasuraman, Zeithaml, and Malhotra's E-S-QUAL measure [10], for example, wrongly included the transaction stage in the website visit stage, as the fulfillment factor, when clearly there may not be any fulfillment.

Right Second-Order Components?
A further problem arises with Collier and Bienstock's [5] identification of the 11 second-order attribute components, listed earlier and in Figure 1 earlier, five of which are nested under the website service quality component, three under the transaction service quality component, and three under the recovery service quality component. These 11 "dimensions" (and there is a problem right here because the word "dimension" implies that the attribute component is "reflective" when it is in fact formed and should be called a "component") are again claimed to be "based on practitioner and academic literature" (p. 264). However, in their summary of the previous literature in their  Table 1 that do not appear in their final list; for example, system availability of the website (one of the components in E-S-QUAL), or range of product (or service) options on the website (which is in none of the academic measures but is in the practitioners' E-Ratings and Bizrate measures). Their list of component attributes for the transaction stage omits the start of the transaction. Important components omitted are ease of inputting personal details [15] and ease of payment, which can vary considerably even with a credit card as the usual mode of payment. Purchase attributes are essential components of the transaction stage, not the website stage even though they occur on the website, because consumers can interact with a website without purchasing.

Right First-Order Components (Items)?
As pointed out previously, Collier and Bienstock [5] failed to realize that the items they selected to represent the 11 "dimensions" themselves represent components: first-order components. According to C-OAR-SE, an item has an object part and an attribute part, and these first-order component attributes also have first-order component objects. For example, the items representing the attribute of (website) design (an attribute that is more correctly labeled as design quality; see the very good discussion of concrete cues versus perceptual attributes in [10]) have, as their respective first-order objects, visual design (items 10 and 11), graphics (item 12), text (item 13), and page width (item 14). The identification of first-order objects illuminates the mistake that Collier and Bienstock [5] made in conceptualizing each of the 11 "dimensions" as being reflective and then using factor analysis and stepwise deletion of items based on coefficient alpha. They dropped an unreported number of items (perhaps none?) from the five website service quality factors, three items (one each) from the three transaction service quality factors, and an astounding 31 items from the three factors representing recovery service quality (p. 268). Presumably, some or all of these deleted items were defining items, referring to first-order objects and attributes that make up the second-order attribute of design quality, and so forth, and are necessary for adequate content coverage of the construct. The defining items would also be necessary to form a valid diagnostic instrument for e-retailer practitioners.

Correct Procedure for Identifying Components and Generating Items
According to the C-OAR-SE procedure, for formed attributes, which the 11 "dimensions" should be, concrete (unambiguous) items should be generated from primary research consisting of qualitative interviews (semi-structured questions and open-ended answers) with a cross-section of e-retailers' prospective and actual customers. The items should then be categorized by two or three judges -experts in the domain of e-retailing -into first-order components, the main first-order components retained, and one content-valid item retained, or newly written if necessary, per firstorder component (see [12], pp. 314-315). These items make up the scale. The next section suggests that there will be far fewer than the 54 items used by Collier and Bienstock [5].

Item Wording
Item wording (the question part of the item) is a fundamental yet much neglected aspect of content validity (see [12], pp. 320-324). Reviewers of new scales typically focus on scale statistics without bothering to examine the items on which the statistics are based. Researchers, in developing a new scale, typically seem to believe that a large number of often poorly worded items will somehow "cancel out" content errors, an approach adopted, apparently by Collier and Bienstock [5] in stating that "we have included numerous items for each [of the 11 'dimensions,' the secondorder components] to ensure that content validity was achieved" (p. 269). The 54 questions in Collier and Bienstock's instrument, as well as the seven questions used to measure their two dependent variables, are reproduced in the present article as an Appendix (the questions are verbatim from the questionnaire in the Appendix to their article, pp. 272-273, but with new labels for the sets of questions). Inspection of the 54 items overall reveals that some of the questions present redundant content (e.g., items 6, 8 and 9; items 27 and 28; items 37, 39 and 43; and items 46 and 48). Redundancy of content has the effect of overweighting the first-order components that the items represent in the total score for the second-order component (the "dimension"). They also unnecessarily lengthen the instrument. Item 30 is logically incompatible with item 29 and should be dropped. The content suitability of the items for the separate service quality stages is assessed next, including these problematic items.

Website Service Quality
The content of the questions for the website service quality components (items 1 to 25) looks reasonably good, perhaps because most of the previous studies consulted by Collier and Bienstock [5] focused on website quality, and, most important, the specific (first-order) attributes in the questions do mostly relate to the formed (second-order) Toward a Valid Measure of E-Retailing Service Quality attribute, service quality, such as ease of navigation (item 1), security (item 7), readability (item 13), availability (item 15), objectivity (item 20), and site speed (item 24).

Transaction Service Quality
At least four of the questions for transaction service quality (items 27, 28, 32 and 33) have inappropriate content. As pointed out earlier, these questions are about past transactions with the e-retailer instead of being about the most recent transaction, to which the other questions in the instrument refer (and which was the basis for the sampling of events, or transaction objects, in Collier and Bienstock's test of the instrument [5]). These questions assume that there have been past transactions and that the respondent is a repeat customer (the questions could not be meaningfully answered by first-time, and thus single transaction, customers). Use of the wrong attitude object, past transactions, has the likely result of nonsensical middle-of-the-scale "neither" answers for these attributes. Moreover, all of the items refer to products bought online and do not apply to online services, especially services purchased where there is no physical product other than the service receipt, such as e-tickets for air travel.

Recovery Service Quality
The 20 questions for recovery service quality (items 35 to 54) are almost surely too many, and this is after deleting 31 items due to inappropriate use of the "reflective" attribute operationalization. Ten items alone were used to measure the recovery personnel competence (Collier and Bienstock's [5] interactive fairness) component and it is difficult to accept that each is an important defining item. Qualitative interviews and expert judgment could have been used to identify the really highly important first-order components of the recovery personnel competence component and it is likely there would be no more than five or six. On the plus side, the items seem to relate to the overall quality attribute but to label this attribute as fairness, as Collier and Bienstock did, is dubious. On the other hand, the underlying attribute for recovery outcome fairness (Collier and Bienstock's outcome fairness) component is arguably fairness, not quality, but fairness must be convincingly argued to be a sub-attribute of service quality, otherwise the scores cannot meaningfully be summed to form the total recovery quality score (see [9] for the theoretical argument that not only outcome fairness but also customer-favored equity of the outcome are necessary components of service quality). However, there is still a content problem with the four questions (items 45 to 48) for the recovery outcome fairness component because one of them is a summary evaluation item rather than a concrete first-order component item (item 46: "The outcome I received was fair"). Factor-analysts have long warned about the inclusion of summary items among specific items because the summary item will almost always "attract" specific items to form a factor (e.g., [7]). The same problem occurs among the six items measuring recovery procedure quality because item 54 asks for a summary evaluation: "Overall, the e-retailer had a good procedure for dealing with complaints." Questions 46 and 54 should be dropped, further shortening the instrument.

Likert Answer Scales
Moreover, Collier and Bienstock's [5] use of Likert answer scales, which are advised against in C-OAR-SE, means that the attribute part of the item is built into the question part instead of the answer part (see [12], pp. 322-324). For example, consider item 3: "This e-retailer contains a site map with links to everything on the site." The first-order object here is a site map with links to everything on the site and the first-order attribute is contains. This question calls for a "yes" or "no" answer; it cannot be properly answered by indicating Likert-type "degrees of disagreement or disagreement." An example of a needlessly complicated and ambiguous item is item 6: "I trust the Website administrators will not misuse my personal information." The object in this item is the administrators of the website and the attribute could be read as double-barreled, involving whether the rater trusts the administrator and whether the administrator will misuse personal information. A conceptually simpler version, though still in the unsuitable Likert question-and-answer format, would be "The Website administrators are likely to misuse my personal information," "strongly disagree… strongly agree."

Correct Item Format
According to the C-OAR-SE procedure ( [12], p. 323), the items should have a question part that identifies only the object, and then an answer part whose answer categories represent degrees of the attribute. In the case of eretailing service quality items, the degrees of the attribute should represent a unipolar continuum. Qualitative research with consumers is necessary to specify the answer categories correctly in terms of how many and what categories of the attribute the raters can meaningfully distinguish. An example of the correct question-and-answer format for item 11 would be: This e-retailer's Website design is: □ not at all innovative □ moderately innovative □ very innovative

Enumeration (Scoring)
Enumeration, which is the final step in the C-OAR-SE procedure, is a conceptual step, not a mechanical, numerical formality in the application of the scale (see [12], pp. 324-326). The three components of e-retailing service quality, which are website service quality, transaction service quality, and recovery service quality, should be scored, and used, only as three separate constructs. The three stage quality scores should be used as sequential predictors: the website service quality score should be used to predict whether or not consumers proceed to a transaction (see [15]), and the transaction service quality score should be used to predict whether or not consumers will experience a problem that will engage the recovery stage. For those consumers who proceeded to a transaction, website service quality and transaction service quality can be used as multiple predictors, and recovery service quality used as an additional predictor for those who complained, to predict future transaction intention. Discriminant analysis, binary probit regression or, probably the best technique, the "equity estimator" [11], would be appropriate for testing these predictions.

Problems with the Other Constructs
In this connection, it should be pointed out that the satisfaction construct (just to label the attribute rather than the total construct) that Collier and Bienstock [5] included as a dependent variable was over-measured, with four redundant items (items 55 to 58 in the present Appendix table) used to produce a high and unnecessary coefficient alpha. Why the researchers would want to measure satisfaction in the first place is questionable given that this variable is, in effect, just a proxy for overall service quality ( [14], p. 63). This is evidenced by the single summary item, item 56, "In general, I was pleased with the quality of service this e-retailer provided." Not only was there no need for an overall service quality variable but its inclusion in the regressions could have contributed to the lack of prediction exhibited by transaction service quality and recovery service quality by "suppressing" these two variables in predicting behavioral intentions.
Considering finally Collier and Bienstock's ultimate dependent variable, behavioral intentions, this construct should be reconceptualized as the consumer's future transaction intention, singular. The only relevant item of the three that Collier and Bienstock used is item 61: "I intend to purchase from this e-retailer in the future." The item before it, item 60, "I intend to continue to visit this e-retailer's site in the future," taps behavioral intention for only the first stage of eretailing, which is a visit to the website. The item before that, item 59, "I will recommend this e-retailer to my friends," represents a different behavior altogether and should be analyzed separately (incidentally, three of the five items in Parasuraman, Zeithaml, and Malhotra's loyalty intentions measure [10], p. 231, refer to recommendations and thus seriously overweight this different behavior). That WOM recommendations are a different behavior is shown by the scoring of word-of-mouth intention across individuals as net (positive WOM minus negative WOM) which appears to provide a unique predictor, unlike other intended behaviors, of future sales growth [8]. Conceptually, the three items should not have been summed into a single score and, practically, the composite score is meaningless. The answer scale for future transaction intention should be one that can be enumerated into probabilities, such as the well-known Juster scale or Wallsten, Budescu, and Zwick's [17] scale as recommended in C-OAR-SE ( [12], pp. 323-324), which is "impossible" (0), "unlikely" (.15), "slight chance" (.30), "toss up" (.50), "likely" or "good chance" (.70), "pretty sure" (.80), or "certain" (1.00). The insufficiently content-valid measure of intention may be another reason why Collier and Bienstock found the implausible results that transaction service quality failed to predict their dependent variable of This e-retailer gives the customer numerous payment options.
The e-retailer provides a confirmation of items ordered.

TRANSACTION SERVICE QUALITY
Order Condition 26. This e-retailer's orders are protectively packaged when shipped. 27.
All orders by this e-retailer are delivered undamaged. 28.
Damage rarely occurs during transportation of my order from this e-retailer.
The time between placing and receiving an order is short. 31. This e-retailer is able to respond to a rush order.
Order Accuracy 32. My orders from this e-retailer rarely contain the wrong items. 33.
My orders from this e-retailer rarely contain incorrect quantities.

34.
This e-retailer's billing is accurate.