In the late 1990s, the first online pharmacy appeared in the United States, selling over-the-counter and prescription drugs. By 2009, there were about 3000 websites selling prescription drugs, and this number had increased to 35,000 in 2016 [1
]. At the same time, more and more patients are buying drugs through pharmaceutical e-commerce sites. According to an FDA survey in 2012, 23% of adult online shoppers have bought drugs online [2
]. In China, with the rapid progress of internet technology and the change in consumers’ shopping habits, pharmaceutical e-commerce sites have developed rapidly. From 2012 to 2016, the total transaction volume of drug B2C (Business To Customer) business increased by nearly 10 times in China, with a compound annual growth rate of 77.2%, and reached about 28 billion dollars in 2016. With the promotion of pharmaceutical enterprises and e-commerce websites, it is estimated that drug B2C business will maintain a high compound annual growth rate of 41.9% in the future, and will reach about 100 billion dollars in 2026 in China.
The pharmaceutical e-commerce market contains huge commercial value, which not only attracts extensive attention from pharmaceutical companies and e-commerce websites but also arouses the interest of researchers. The prior literature has mainly focused on how to develop pharmaceutical e-commerce. For example, Orizio et al. [3
] believed that policy regulations and individual health literacy were two main aspects that affect the promotion of pharmaceutical e-commerce. Roger et al. [4
] found that the certification body provided useful information for pharmaceutical e-commerce sites and online consumers, which can help consumers make better purchasing decisions. In addition to macro policy factors, understanding individuals’ evaluation of online pharmacies at the micro level and analyzing consumers’ concerns when they buy drugs online can provide an important decision-making basis for the development and improvement of pharmaceutical e-commerce services [5
]. By comparing the online price with the offline one, the CIPA survey found that cost saving was paramount for consumers to buy drugs online [5
]. Monteith et al. [7
] documented that the benefits of shopping for drugs online include convenience and low price. In addition to considering the lower price and availability, Abanmy et al. [6
] conducted a more detailed survey and identified four factors for consumers to buy drugs online. They are unavailability in the local market, a cheaper price, convenience, and good services such as home delivery and refill reminders by email.
At present, most of related research is based on population-level surveys. The data are obtained from questionnaires or interviews. However, questionnaires are time-consuming and laborious in data acquisition, which is an inefficient method [8
]. Besides, the quality of the data obtained from surveys depends on the complexity or length of the questionnaire and the willingness of respondents to participate. The subjective bias will make it difficult to replicate the results of these studies [9
]. Moreover, the data obtained from surveys may quickly become outdated. Therefore, it is worthwhile to consider other alternative sources of data for finding out consumers’ concerns when they buy drugs online.
Online reviews, as an important source of information, provide an alternative solution to alleviate the problem [11
]. In China, many pharmaceutical e-commerce sites have launched review mechanisms that allow consumers to post online reviews based on their purchase experiences. Compared with traditional market research methods such as questionnaires, online reviews published on platforms are all public, which can obtain a large number of reviews at a low cost and in a short time [12
]. In addition, reviews may better reflect the real opinions of consumers [13
], as users participate voluntarily. Furthermore, online reviews are updated in real time, which can enable us to grasp changes in factors in a timely fashion through real-time analysis techniques [14
In this paper, we intend to examine what consumers say about their experience after buying drugs from online platforms by analyzing a large empirical dataset collected from one of the largest e-commerce sites (JD.com, Company Location: Beijing, China.) in China. These findings can be translated into management insights that can lay the foundation for pharmaceutical e-commerce sites and drug companies to improve customer satisfaction and corporate performance. Specifically, first of all, we apply the structural topic model (STM) to reliably identify consumer concerns about online drug purchases through online reviews. Furthermore, the STM allows us to incorporate document metadata (e.g., whether reviews are positive or negative) into the data generation process of the corpus, resolving the limitations of the traditional Latent Dirichlet Allocation (LDA) [11
]. This advantage enables us to study the generation of positive and negative reviews by means of rigorous statistical analysis, thereby identifying negative topics that make consumers dissatisfied, among the consumer concerns [15
]. Finally, considering the dynamics and interdependence of the topics, we also analyze changes in them over time.
Our research makes the following two contributions to the literature. The first is our methodological contribution. To the best of our knowledge, we are the first to introduce the STM into online drug reviews, while previous literature was mainly based on the conclusions of questionnaires. Our method helps to reveal and analyze the real voice of consumers who buy drugs online and contributes to the management literature of pharmaceutical e-commerce.
In addition, our research also provides important management practice enlightenment. Based on a large number of review samples, we summarize 12 major topics, three of which have almost not been mentioned in the survey-based existing studies. They are expiration date, after-sale service, and product packaging. Based on the unique advantages of the STM, we reveal consumer dissatisfaction and describe the changing trend of these consumer concerns over time. For the former, we find that the expiration date and after-sales service of the product are the two most important factors that make consumers dissatisfied, which would also have a significant impact on the reputation of the business through the openness of online reviews. However, these are not mentioned in previous studies. For the latter, the prevalence rate of consumer dissatisfaction in the negative reviews shows no obvious downward trend over time, indicating that pharmaceutical e-commerce sites and pharmaceutical enterprises still have huge room for improvement. On the other hand, the prevalence rate of drug price in the prevalence of positive reviews show a declining trend over time, indicating that the low price strategy implemented by pharmaceutical e-commerce and drug companies is gradually losing its effectiveness. On the contrary, customers are paying more attention to the improvement of service quality.
The rest of the study is arranged as follows. In the second part, we review the research literature related to pharmaceutical e-commerce and the topic model. The third part introduces the basic process of data collection and processing. The fourth part describes our data and model setup. The fifth part presents the main results of the study. Finally, the sixth part summarizes our research and analyzes the limitations as well as suggestions for future research.
3. Data Collection and Processing
The data collection and preparation steps adopted in this study are as follows. Firstly, for obtaining the research data, the online drug reviews come from JD.COM, one of the largest B2C e-commerce platforms in China, which has a large number of review data that can be freely obtained by users. Customers can post not only textual reviews but also digital online ratings after purchasing drugs on JD.com. In addition, not limited to recent time series review data, online reviews posted by customers a few years ago can also be queried. All of these provide convenience for our research. This site has also served as a data source for many previous studies [61
]. We write programs in Python to collect online drug reviews from customers posted on JD.com. The data for building the topic model should cover different hierarchies, such as different brands, different products/services or different categories [14
]. Therefore, we crawl a total of 79,328 online reviews of 100 kinds of drugs covering 10 major drug categories on the JD.com pharmaceutical e-commerce module. The items include the text content of reviews, the online ratings and the dates of the reviews. The review dates range from 2016 to 2019. The pharmaceutical e-commerce business on JD.com started in 2016, and the number of reviews in 2016 was too small (less than 0.1%), so we only keep the review data from 2017 to 2019. Finally, we remove duplicate samples and get 78,732 online reviews through the above processes.
The second step is to select our research sample from all the available online reviews. Due to the existence of individual selection bias [64
] in consumer reviews, online reviews on e-commerce platforms (such as TripAdvisor.com, Amazon.com and ebay.com) follow a positively skewed J-shaped distribution. The shopping process and review mechanisms on JD.com are similar to those on the above websites, so the reviews on the JD.com website also show a positive J-shaped distribution [61
]. In our sample, the percentages of 5-point and 4-point ratings are 79.9% and 5.3%, and those of 1-point ratings and 2-point ratings are 9.3% and 2.8%, respectively. In the previous research on e-commerce platform reputation, reviews with 1- or 2-point ratings are defined as negative reviews, whereas those with 4- or 5-point ratings are positive reviews [15
]. We can find that the number of positive reviews remains overwhelmingly larger than that of negative ones in our sample. In the construction of the STM model, the sample size corresponding to the covariates should be as consistent as possible [15
]. We aim not only to extract topics from drug reviews but also to identify topics that appear significantly more in negative reviews than in positive ones. Therefore, when using review extremities as covariates in the STM model, it is essential to initially filter the positive and negative reviews to balance the sample size of positive and negative reviews [15
], which can help us to identify topics that appear significantly more in negative reviews than in positive ones more reliably [15
To alleviate the possible undesirable effects of an unbalanced sample [69
], we use the sample selection method, the same as Hu et al., having a similar background and needs in our study [15
]. We intend to build a corpus, in which the number of positive and negative reviews is equal. We first define 1-point and 2-point ratings as negative reviews and only 5-point as positive reviews to address the serious imbalance between positive and negative online reviews. Then, we randomly select the positive online review samples with the same number of negative ones from the samples after the above processing step. During the random sampling process, we also set up an equal number of positive and negative reviews for each class of drugs. Finally, we get a total of 19,054 reviews, which would be used to build the topic model.
The last step is text preprocessing. For Chinese text, a necessary step is word segmentation, which is different from that of English text. We complete this step by using the Jieba package in Python. Jieba word segmentation adopts an algorithm based on Trie Tree Structure, which can efficiently realize word graph scanning and obtain all the possible word formations of Chinese characters in sentences by using word graph scanning. All of these word-forming possibilities form Directed Acyclic Graphs for quick lookups. Jieba is one of the most effective word segmentation tools in the Chinese language (for more information, see https://github.com/fxsjy/jieba
). Then, we remove numbers, punctuation marks and stop words including those from standard stop word lists and user-defined word lists such as drug and nice. The above step is also completed in Python.
6. Discussion and Conclusions
Our study aims to analyze what concerns consumers when they buy drugs online by using online reviews. To our knowledge, our study is the first to apply the STM model to online drug reviews. Compared with the existing literature methods, the STM model has obvious advantages in finding and measuring consumer concerns including satisfaction and dissatisfaction. Firstly, the original research method based on questionnaires was not only laborious and difficult to use to obtain a large number of data samples, but also had to rely on high-level experts to define a set of standard variables in advance. Our method not only can obtain a large number of samples at low cost but also relaxes the requirements of predefined variables and expert level. We embody the user-centric management philosophy by paying more attention to online user reviews. At the same time, we demonstrate the applicability of the topic model to drug reviews as we captured new topics that were not defined in the original questionnaire. Secondly, our research uses a structured topic model that can more effectively measure these consumer concerns compared with the LDA wildly used in previous literature. The STM model has obvious advantages in that it can implement document-level covariates (the extremity of reviews in our study) into the prior distribution of document-topics and topic-words so that we can do richer research and explore the concerns associated with user dissatisfaction among many elements. Finally, we take full advantage of the advantages of online reviews as a data sample and analyze the changing trends of topics over time, which is not possible with the questionnaire method.
Our study also provides important practical management implications for pharmaceutical e-commerce websites and enterprises. First of all, our research provides a basis for companies to improve services and design marketing strategies topics by using the STM to find 12 topics. More importantly, we find three topics that were almost not appearing in the original questionnaire, including the expiration date of drugs, after-sales service and product packaging, which enriches the enterprise decision database. Secondly, we add the extremity of reviews as a covariate to the document-topic prior distribution to capture the five negative topics, which are the true voice of customer complaints. This is because the five topics we identified are not what customers are talking about in negative reviews but statistically appear more frequently in negative reviews than in positive reviews. This comparison strategy helps researchers and drug-related companies reveal the true voice of customer dissatisfaction [26
]. The expiration date of drugs and after-sales service are the two most important factors of dissatisfaction, which have a significant impact on the reputation of the merchant. However, these two topics are not found in previous literature, which highlights the significance of our research. Finally, we analyze the trends of topics over time. The topic prevalence of the five dissatisfaction factors in the negative reviews has no obvious downward trend over time, except for distribution services. It shows that pharmaceutical e-commerce websites and enterprises still have great room for improving consumer satisfaction. Among the seven satisfactory factors, the topic prevalence of positive reviews of price has shown a downward trend over time, which indicates that the low-price strategies implemented by pharmaceutical e-commerce websites and pharmaceutical enterprises are becoming less attractive to customers. On the contrary, customers pay more attention to the improvement of service quality. These findings provide a decision basis for the adjustment of business strategies.
Our study has a set of limitations that can be explored in future research. Firstly, we collect drug review data from just one platform. Future research could collect more review data of different categories of drugs from different platforms to better summarize the research results. Secondly, our research adds the extremity and time of reviews as covariates to the STM model, while future studies could incorporate more covariates. One of them is to include prescription and over-the-counter drugs as covariates in the study. Customers can buy both of them online currently, but the regulation of prescription medicines is stricter. In addition, the type of platform can also be added as a covariate in future research. The platform for online drug purchases includes not only large e-commerce platforms such as JD.com, but also some small online pharmacies. Customers may pay different attention to factors when purchasing drugs on different platforms. Third, online reviews, as a reflection of the true voice of customers, are wildly used in different research areas such as product/service improvement and sales forecasting. Our study pays attention to the valuable data resource of drug reviews for the first time. More research based on drug reviews in the future will enrich the literature on the management of pharmaceutical enterprises and the development of pharmaceutical e-commerce.