1. Introduction
Personal finance forums represent a major example of online communities [
1]. The influence of online communities on the formation of opinions cannot be overstated; the electronic word-of-mouth (eWOM) phenomenon has been studied in several contexts. In [
2], the factors that drive consumers to adopt and use the messages from other consumers were investigated, with information relevance and information comprehensiveness resulting as the most vital elements for influencing information adoption within an online consumer community information. It has been shown that both eWOM content and observing other consumers’ purchases significantly affect consumers’ intentions to buy a product [
3]. However, although more positive comments lead to more purchases in many cases, involvement and prior knowledge partially moderate the relationship between the ratio of messages and the eWOM effect, since the credibility of web sites and eWOM messages can be damaged in the long run if all the eWOM messages are positive [
4]. Potential customers (i.e., those who would like to purchase a product in the near future and are currently reading reviews with the intention to decide whether or not to buy that product) find the negative reviews containing service failure information and the positive reviews containing information on core functionalities, technical aspects, and aesthetics to be more helpful [
5]. The impact of eWOM is particularly relevant for the financial industry. It has been shown that banks’ higher profitability is related to more verbalised positive feelings [
6], and internet postings significantly affect stock prices [
7]. Approaches to forecasting the prices of financial assets through the analysis of texts from different sources (financial reports, news, message boards, social media, etc.) are spurring an increasing interest in the literature, as shown in the survey contained in [
8].
The behaviour of participants in online communities is a major issue in any case, since their comments may be driven by several factors. Though posters are often led to join the discussion by their search for social support and social identification [
9], members of brand communities may develop “oppositional brand loyalty” towards other rival brands [
10], so their opinions are often biased. In fact, a basic analysis of the major personal finance forums reveals that almost 95 percent of conversations taking place are either neutral (mostly statement-based) or negative (with 30% directed specifically at brands) [
1].
In general, we can therefore assume that personal finance forums may be a significant source of influence on personal finance choices and can lead to opinions directed by personal biases rather than objective analyses. It has been shown in [
11,
12] that the collection of sentiments on finance-related forums may provide a useful input to machine learning tools to manage one’s own portfolio and outperform traditional asset allocation methods. Personal finance forums should therefore be monitored by companies acting in the sector (e.g., financial companies, insurance companies, banks, etc.) in order to understand which forces drive them and how the interests of companies may be affected.
In this paper, we analyse personal finance forums, exploring their interaction dynamics and the sentiments conveyed by posts. In particular we wish to address the following research questions (RQs):
Is participation in online personal finance forums uniform?
Do some participants take a lead and possibly exert a higher influence on others?
Are major participants inclusive? Or are other participants put off by their dominance?
What sentiments are exhibited on posts?
We employ a mixture of approaches, adopting tools from industrial economics, social network analysis, and sentiment analysis. We apply those tools to the top threads of a major personal finance forum to get an exploratory view of the behaviour of posters on personal finance forums.
We provide the following contributions and findings:
3. Dominance in a Thread
A major aspect we wish to examine was the presence of dominance phenomena, i.e., of posters dominating the interaction that takes place within the thread. For the purpose of this analysis, a thread is seen as a sequence of posts and the associated sequence of posters. In this section, we analyse dominance phenomena by employing three tools: rank-size plots, the Hirschman–Herfindahl Index (HHI), and the concentration ratio of top 4 (). After describing these tools, we examine the results of their application to our datasets. The choice of adopting those three indices rather than relying on established centrality indices in social networks, such as the degree centrality index, is that these do not consider the presence of self-loops and cannot, therefore, properly represent the phenomena we are interested in.
The notion of dominance is considered here with reference to the number of submitted posts: individuals (or a group of individuals) are dominant if they submit a large fraction of the posts.
We first examine how posts are distributed by looking at the rank-size plot: after ranking posters by the number of posts they submit, the frequency of posts is plotted against the rank of the poster. It is to be noted that the rank-size distribution of the number of posts coincides with the degree distribution of the associated social network, examined, e.g., on Twitter [
19]. We observe, of course, a decreasing curve, but the rate of decrease provides us with relevant information. First, the higher the rate of decrease, the higher the concentration of posts in the hands of a restricted number of posters (hence, their dominance). Second, we may observe a power law (a.k.a. a generalised Zipf law), where the number of posts submitted by the poster with rank
i decreases as the following expression (see [
20] for a description of its characteristics):
where
k is a normalising constant and
is the Zipf exponent. On a doubly logarithmic scale, the power law relationship appears as a linear one, with the Zipf exponent measuring the slope of the log-linear curve. Hence, the higher that slope
, the higher the dominance. If we focus on the frequency ratio between the poster of rank
i and its runner up in the ranking list, we easily obtain
The sequence of
values gives us a more intuitive idea of how fast the frequency of posters decreases with the rank and how prevailing a poster is with respect to its less frequent colleagues. The Zipf law, first introduced by Zipf himself in [
21] to describe the distribution of words, has been observed in a number of contexts, from the distribution of populations over cities [
22] to the distribution of links on the web [
23], the size of companies [
24], and the distribution of traffic over a telecommunications network [
25]. Assessing whether the distribution of posts follows Zipf’s law is relevant since it allows us to apply the body of knowledge developed for such a distribution.
In
Figure 1, we have plotted the observed rank–size relationship for our 5 threads. We see a roughly linear trend, indicating an adherence to Zipf’s law for 4 out of 5 cases, leaving out the life insurance dataset, where the frequency of posts falls at a lower rate than what we would expect from a power law.
By performing a best quadratic fit, we get a rough estimate of
, reported in
Table 2. We see that the Zipf exponent is around 1.5 for 4 out of 5 cases and is much lower for the budget-minded dataset. In order to get a more intuitive feeling of how dominant some posters are, we can recall the
and
ratio, as defined in Equation (
2). For the slowest decrease (lowest
), exhibited by the budget-minded thread, we see that the top poster submits nearly twice as many posts as its runner-up (
), while the tenth most frequent poster still submits 10% more posts than its runner-up (
1.1). For the fastest decrease (the healthcare thread, since the life insurance thread deviates significantly from a power law, so that we cannot apply Equation (
2)), we see that the first poster submits nearly three times ad many posts as the second poster
, and the tenth poster is 15% higher than its runner-up (
).
As more general indices to assess dominance position, we borrow two from industrial economics: the Hirschman–Herfindahl Index (HHI) [
26,
27,
28] and the
[
29,
30]. For a market where
n companies operate, whose market shares (expressed as a fraction of the total market) are
, the HHI is
The HHI satisfies the inequality
, where the lowest value corresponds to the case of no concentration (perfect uniform distribution of the market) and the highest value represents the case of monopoly. Therefore, the larger the HHI, the larger the concentration (hence, the dominance). Instead, the
measures the percentage of the whole market owned by the top four companies:
Similarly, the higher the , the heavier the concentration.
In our case, the fraction of posts submitted by a poster can be considered as its market share, so that the HHI can be redefined as
where
m is the number of posters and
is the number of posts submitted by the generic
i-th poster.
Instead, the
can be defined in terms of poster frequency as
For our datasets, we get the results reported in
Table 2. According to the guidelines provided by the U.S. Department of Justice, the point of demarcation between unconcentrated and moderately concentrated markets is set as
[
31]. In our datasets, we find that the HHI is higher than that threshold in the Healthcare and First Real Job cases, very close in the Life Insurance case, and rather close in the Wealth Redistribution case. We can therefore conclude that a signification concentration is present in 2 out of 5 datasets.
If we turn to the
, we can adopt the correspondence between market structure and CR4 values reported in [
32]. According to that classification, if
we have an effective competition, while we have a loose oligopoly if
and a tight oligopoly or a dominant firm if
. In
Table 2 we see, by looking at the
indicator, that the top 4 posters submit more than 60% of all the posts in 4 out of 5 cases, leaving out just the Budget-Minded dataset. Delving deeper into the top 4, as reported in
Table 3, we also see that the most frequent poster typically contributes a large portion of the posts, over 20% in 4 out of 5 cases (again with the exception of the Budget-Minded dataset, where the percentage is, however, a close 18.92%), with a peak in the First Real Job dataset, where the most frequent poster contributes a staggering 36.84% of the total number of posts. According to the
, we have a tight oligopoly in 4 out of 5 threads and a loose oligopoly in the fifth one.
Summing up, all three indicators suggest that a significant level of concentration is present in those forums.
4. Interaction Dynamics
In addition to detecting dominance phenomena through measures that encompass the whole sequence of posts, we wish to understand interaction phenomena that take place at a smaller scale, i.e., over a very smaller subsequence of posts. In this section, we build the social network that is embedded in each thread and define measures that help us understand how posters interact. In particular, we look at two indicators—the number of self-replies and the number of rejoinders—that may reveal obnoxious behaviour.
We first build the social network embedded in each thread. The network graph is built by employing the following rules:
the nodes of the network represent the posters;
an edge is drawn between node i and node j if poster i replies to a post submitted by poster j (i.e., poster i comes immediately after poster j in the sequence of posts);
the weight of the edge between node i and node j is the number of times poster i replies to a post submitted by poster j.
We end up with a weighted directed network. The resulting network for the sample case of the Healthcare thread is shown in
Figure 2, where we have adopted a degree centrality-based visualisation (actually, we show here the core of the network, removing all nodes whose degree is just 1). In that network, we do not show self-loops (representing self-replies) in order to not clutter the graph. Even if we do not consider self-replies, that degree-based representation shows a dominance of a restricted number of posters (shown in the core of the graph). Similar graphs are obtained for the other threads.
As we can see, just a small subset of the posters interact directly. We would have a complete direct interaction if any poster would reply to the post of any other poster at least once. In that case we would get a fully meshed network, which is not the case of
Figure 2. Instead, we see that the number of edges is much lower than its potential maximum (i.e.,
, which is roughly the square of the number of posters). The lack of direct interaction can be quantified by computing the sparsity of the associated weighted adjacency matrix, i.e., the percentage of zero entries. Its complement to 100% is the percentage of direct interactions. As shown in
Table 4, the percentage of direct interactions can be lower than 5% (i.e., a sparsity larger than 95%, as in the Budget-Minded thread) and is anyway not larger than 20%.
We now turn to the nature of those direct interactions by examining self-replies and rejoinders.
A poster may exert an excessive influence over the others not just by submitting more posts (which we detect by measuring the degree of concentration through one of the indices described in
Section 3), but also by submitting posts in a rapid sequence, one after the other like a machine gun. We would therefore observe a sequence of posts submitted by the same poster; we call those posts self-replies because it is like the posters replying to themselves. Self replies may represent an obnoxious behaviour because they occupy the virtual space and reduce the possibilities for other posters to state their opinion. It is a behaviour counter to normal turn-taking. A related issue has been studied in verbal communications, where turn-taking has been shown to have a significant impact on the dynamics of a conversation [
33,
34].
Another relevant phenomenon is that of rejoinders. In fact, we may have the case of a poster A replying to a poster B who had just themselves replied to the poster A. This behaviour, which could be the manifestation of a flame or a sort of tit-for-tat, can be obnoxious as well, though the real nature should be better assessed by checking for the presence of aggressive or offensive language. We call the corresponding posts rejoinders. We classify as rejoinders not just the quick ABA patterns, but also the more complex ABBA, ABBBA, and so on, where the rejoinder comes after a sequence of self-replies.
In
Table 5, we report the number of self-replies and rejoinders detected in our 5 threads. We can compare those figures with the overall number of posts shown in
Table 1. Self-replies represent a large portion of posts for the Life Insurance thread (20.3%), while they are just a minor portion (4.2%) for the Budget-Minded thread. Rejoinders are instead a widespread phenomenon for all threads, representing more than a quarter of all posts for the Wealth Redistribution, Healthcare, and First Real Job threads.
Overall, the dynamics show a heavy presence of direct interactions between a small subset of specific pairs of posters and a significant presence of self-replies.