Review Reports - This Is the Way People Are Negative Anymore: Mapping Emotionally Negative Affect in Syntactically Positive <i>Anymore</i> Through Sentiment Analysis of Tweets

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear author,

Plase fin my comments/suggestions in the attached file.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Please see the attached file.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for your paper, which I have read with pleasure. It’s an interesting topic, the writing is agreeable and the exposition is clear. I have four major comments regarding the design of the analysis and the annotation:

** Starting from the observed statistical differences in valence between the different classes of triggers for NPI-anymore and NPAM, the paper reaches the conclusion that NPAM really occurs with negative affect, much more than NPI-anymore. There is, however, a logical fallacy here, because it could equally be possible that there are sentences which contain an NPI-trigger but that are actually NPAM-usages - simply that trigger does not trigger anything and the interpretation as NPAM is not dependent on the trigger. The point here is that the mere presence of a trigger does not necessarily/exclusively entail a NPI-reading.

In addition, some of the examples given by the authors as examples of NPI-usages, can also be interpreted as NPAM-instances. For instance let’s take “(6) You’re the only thing that makes sense anymore”. Why would a reading as an NPAM not be possible? After all if you substitute anymore with nowadays one has a perfectly fine sentence. Or “(8) Do liberals even know what they stand for anymore?” Again substituting anymore with nowadays seems totally acceptable.

This brings me to the following recommendation: whether anymore is used as NPI or NPAM should be assessed independently of the classification of triggers, because there is no univocal relationship between triggers and (pragmatic) use. If you accept that NPAM develops out of NPI, this means that there must be some ambiguity (or 'bridging contexts') in the functioning of the underlying triggers. Such an independent assessment could take place by means of a lexical substitution with nowadays for instance, but it requires going back to the data (or a sample of the data, if annotation is too costly). Simply by saying the often does triggers are good at distinguishing NPI/NPAM is not sufficient.

** I feel that the classes of triggers mix syntactic with semantic information, and coupled with the fact that the study is assessing pragmatic/semantic features on these classes, makes things more unclear.

There is, for instance, a problem of circularity if one is to a-priori distinguish classes of negative words (from other, neutral words) and then verify whether the valence of these classes differ. That the class of “Negative Affect” is the most negative, seems pretty obvious. What would make more sense is to 1) define the trigger classes exclusively in syntactic terms, 2) see how much each class is associated with NPI/NPAM and 3) see whether valence/dominance/arousal differ between those classes for each of NPI/NPAM. I think that separating those three dimensions in a more principled way will offer more benefits to the analysis (which arguably obfuscates syntactic and semantic differences).

Furthermore, it is not clear to me how the class of “Semantic NEG” differs from the class of “Negative Affect” and how one the valence between those classes can be so different. It might be good to give examples of items that end up in each class, to understand if such a distinction is actually relevant.

** It feels counterintuitive to have the analysis based on the Osgood-variables come after the analysis based on the VADER-algorithm, because the former is clearly more coarse grained than the latter one. So you could have the Osgood-analysis as a sort of prior thing to look at, while the VADER-analysis tackles the analysis at the right level of granularity. If I can be even stronger, the Osgood-analysis doesn’t offer any benefit on top of the VADER-analysis.

Earlier on in the paper the authors argues, with reason, that individual social media posts lack context. I find that a strange rhetorical stance, given that what you do with sentiment analysis definitely does not take broader context into account (it is a typical 'reductionist' NLP-technique). The Osgood-way is even more acontextual and coarse-grained.

** How come there is no multiple regression analysis on those indices that includes the interaction of dialect region and the trigger classes? It would make a lot of sense to check the impact of those two dependent variables simultaneously in one model (per index), instead of separate analyses (by means of KW-tests).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

See pdf

Comments for author File: Comments.pdf

Author Response

We are grateful to the reviewer for their continued thoughtful and helpful feedback and engagement. We’ll summarise our revisions, which we hope resolve some of the remaining issues with our manuscript. We also offer our rationale for points of feedback that we feel unable to implement.

We suspect some of the remaining unclarity in our trigger categories of ‘Semantic NEG’ and ‘Negative Affect’ may stem from the labels we chose for these categories. We appreciate this sounds initially like a superficial revision, but we have changed these categories to ‘Inherent NEG’ and ‘Emotionally Charged’ throughout all materials in the paper. We have also further revised our introductions of these categories, particularly Inherent NEG to attempt to clarify the fundamental point that the triggers in that category standardly license NPIs, while the triggers under Emotionally Charged should never license NPIs. Finally, we added six constructed examples using the NPI ever at ll. 439-455 to illustrate these NPIs licensing (or not) characteristics.

We hope this revision also goes some way to addressing the reviewer’s concerns that instances of NPAM may occur in standard NPI-licensing trigger categories and be included in those results. We have also revised at ll. 455-458 as, “While we acknowledge […] that this is an imperfect coding scheme, it provides a framework for differentiating between clauses that license NPI-anymore and clauses that could only contain NPAM.” We hope this clarifies that we can only differentiate between clauses that must be NPAM and clauses that can standardly be NPI-anymore.

Beyond this, I’m afraid that the intuition-based evaluation the reviewer requests is not possible as a matter of facts about NPI-anymore and NPAM. As we tried to illustrate with nowadays substitution, both usages describe a current state of affairs and imply a contrast with the past. In fact, a question hanging over NPAM is whether it really is a different word from NPI-anymore or just a loosening of licencing requirements. Arguably, the best candidate for suggesting that NPAM is actually a different word from NPI-anymore is the affective features we are examining in this paper (but of course that can’t be used to decide whether an instance of anymore is NPAM or a standard NPI, as the reasoning would be circular). We can only objectively say X must be NPAM, or Y is a standard environment allowing NPIs which could actually also be NPAM. (On that note, as far as we know, the only paper to engage with the possibility of NPAM passing in NPI licensing environments and to discuss the implications of this is the 2019 paper that our data was collected for.)

We also appreciate the reviewer’s preference for a reordering of our results, but respectfully continue to wish to resist this approach. We don’t agree entirely with the characterisation of VADER as finer-grained than the implementation of Osgood’s semantic differential ratings. For sure, VADER is trained on tweets, polarity sensitive, and measures valence for individual tweets. In those senses, the reviewer is absolutely correct that VADER is finer-grained. But in another sense, interpreting valence from VADER and the Osgood-type ratings is straightforward, where arousal and dominance from Osgood are more nuanced are require more careful evaluation--i.e., the conclusion from valence is that NPAM is used with unhappier language than NPI-anymore, while the conclusions for arousal and dominance are that, perhaps, NPAM is more strongly associated specifically with complaint in the East Midland than the West Midland. So from an interpretation standpoint, the outcomes of the Osgood-type approach are finer-grained. That was our rationale for beginning by reporting the headline result of valence, and then moving through the trickier cases of arousal and dominance. It feels odd to us to go from the more nuanced results for arousal and dominance to the clearer one of valence (an oddness reinforced by the fact that dominance scores are highly correlated with valence in the OSD framework we applied). If we maintain the current order of sentiment dimensions but flip Osgood to come before VADER, then that would mean we would either report Osgood valence, arousal, and dominance, and then jump back to VADER valence. Or Osgood valence, VADER valence, Osgood arousal, Osgood dominance. Either feels less intuitive for readers to follow than our current organisation.

For these reasons, while we entirely appreciate the reviewer’s argument for reordering, we still wish to maintain the current presentation of results. We hope that our explanation of our rationale for the current structure make the presentation of results more acceptable.

We thank the reviewer again for their feedback and advice. We hope that our further revisions have addressed some remaining problems, and that there can be understanding for the points we have not implemented.

Round 3

Reviewer 3 Report

Comments and Suggestions for Authors

Thank for your replying once again to my points. I've noticed the changes in the manuscript, and I think they sufficiently address the concerns that were raised. I appreciated the discussions that have taken place, perhaps somewhat bluntly on my side (for which I apologize), but always with the intention to contribute to a better linguistics.