Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Cancelling Flash Illusory Line Motion by Cancelling the Attentional Gradient and a Consideration of Consciousness

Vision 2019, 3(1), 3; https://doi.org/10.3390/vision3010003

by Katie McGuire, Amanda Pinny

and Jeff P. Hamm^*

Reviewer 1: Anonymous

Reviewer 2:

John Christie

Vision 2019, 3(1), 3; https://doi.org/10.3390/vision3010003

Submission received: 9 November 2018 / Revised: 26 December 2018 / Accepted: 7 January 2019 / Published: 10 January 2019

(This article belongs to the Special Issue Visual Orienting and Conscious Perception)

Round 1

Reviewer 1 Report

The authors present an interesting set of experiments examining the role of attentional gradients in flash illusory line motion (flashILM), and then discuss some implications for understanding consciousness (although I do not feel qualified to comment on the latter section).

In the first experiment, the cue-bar ISI is varied from 16.7 ms to 466.7 ms, and the results show that the strength of flashILM is very strong at the shortest delay (canceling out a real motion signal) and weakens as the ISI grows. The results of this experiment are clear and show convincingly that flashILM has a an effect that can be quantified as canceling a real motion signal in the opposite direction. Interestingly, although ILM can still be measured at longer ISIs at and above 166.7 ms, these are never strong enough to cancel out a real motion signal.

Based on the results of experiment 1 (specifically that ILM is present at both 16.7 ms and 166.7 ms), the authors designed experiment 2 to test whether presenting an initial color change to the bar flattens the attentional gradient such that a subsequent color change does not produce flashILM. The first color change occured at 16.7 ms (presumably causing a large ILM, which was to be ignored by participants), and the second (critical) color change occured at 166.7 ms from the flash (150 ms from the first color change). Responses following the second color change showed no bias in one direction or the other, suggsting that the first color change effectively canceled the attentional gradient produced by the flash, and the second color change therefore does not lead to a second ILM.

While this interpretation of experiment 2 is intriguing, the use of only one SOA of 150 ms between color changes (in the double color change condition of Experiment 2) leaves other interpretations open. In particular, there is a phenomenon related to ILM termed Illusory Rebound Motion, or IRM (Hsieh, P. J., Caplovitz, G. P., & Tse, P. U. (2005). Illusory rebound motion and the motion continuity heuristic. Vision Research, 45(23), 2972-2985.). Here, an initial motion percept is produced similar to flashILM, for example with a red square being adjoined by a red bar. Then, following an SOA (between 50 ms and 500 ms), the entire bar (including the part that was originally the red square) changes color, for example to green. Observers report that upon this color change, there is a motion percept in the opposite direction of the initial ILM. If the bar changes back to red, the motion reverses again, and so on (hence "illusory rebounding motion"). The effect seems to be strongest with SOAs of 400 ms or more, but it is measurable even with shorter SOAs of 75 ms or 100 ms). In experiment 3 of the same paper, the authors included a detection task and showed that attention is *not* drawn to one or the other end of the bar at the longer SOAs, so they argue that attention cannot fully account for the IRM, at least at longer SOAs.

In connection with the current work, it seems possible that the lack of flashILM in the Experiment 2 (double color change condition) is a result of two processes canceling each other out. Perhaps there is still an attentional gradient that could produce ILM, but there is also an opposing IRM-type effect resulting from the intial ILM produced following the first color change, and it happens to be that at 150 ms SOA bewteen the first and second color changes, these two processes cancel out.

In order to test this against this hypothesis, the authors could conduct an additional study identical to Experiment 2, but shortening the SOA between the two color changes so that IRM cannot occur (e.g. 33.3 ms). If there is still no flashILM following the second color change at this short SOA, then it could not be because IRM counteracted it, and the authors' conclusion would be strengthened. However, if there is measureable flashILM with this short SOA between color changes, then it might suggest that the first color change did not (completely) eliminate the attentional gradient, and would provide more clues about the time course of the flattening of the attentional gradient following the first color change.

Further, if the authors also include a condition with a longer SOA (e.g. 200 ms between color changes), they might predict an IRM-type response, with flashILM in the opposite direction.

I believe this additional data (especially a shorter between-color-changes SOA) would allow for much stronger interpretation of the results and a better understanding of the role of attentional gradients in flashILM.

I found two small issues/typos:

- Line 28 should say "two objects"

- On line 140 describing the process elimination of trials, it seems redundant to say that trials with RT > 4000 ms were excluded, when it was already mentioned that trials > 2000 ms were eliminated.

- In line 228 it should say 16.7 ms, not 66.7 ms

Author Response

Further, if the authors also include a condition with a longer SOA (e.g. 200 ms between color changes), they might predict an IRM-type response, with flashILM in the opposite direction.

Authors response:

This is a very interesting possible line of inquiry. The rebound illusion protocol, as described above and given in detail in the Hsieh et al (2005) article, starts with a single inducer, and involves matching the colour of the bar with the inducer, which combines the polarized gamma motion (PGM) protocol with the transformational apparent motion (TAM) protocol. Both of these illusions are known to reverse direction when an existing bar offsets – meaning the illusory motion is towards the inducer rather than away from it. That does not happen with flashILM, where the motion for both onset and offset bars is away from the location of the flash. Recent work also suggests that _flashILM, PGM, and TAM, while perceptually similar, are not the same illusion (see Hamm, 2017 and Ha & Hamm, in press). In otherwords, neither TAM nor PGM require attention to be asymmetrically allocated, and so the rebound illusion in the absence of attention being re-allocated to the terminal end of the illusion would not be necessary for it to occur regardless of which of these latter illusions the rebound illusion reflects. To clarify, it seems possible that the 2^nd colour change in these displays could be processed similarly to an offset event, meaning when the bar and marker change to the 2^nd colour, this is treated as an offset of the 1^st colour and for both TAM and PGM that would produce motion towards the marker by itself and doesn’t require the initial motion (while anecdotal, a quick set up of a blue rectangle that changes to a green bar while retaining a blue square at one end does produce for myself motion towards the blue square). This is, of course, slightly different in that in the anecdotal set up described the marker must be retained, but the possibility that the initial motion also has influence on object and/or early visual system, is a question ripe for investigation. If the rebound illusion is a form of TAM or PGM, then the current protocol is not one that should produce either of these illusions. In addition, we had run a smaller study (only 12 participants), similar to E1 (so all 4 delays) where dual colour changes always occurred (so at the short delay, the bar would change to “red” for a single screen refresh and then change to the target colour). While the shortest delay produced a strong illusion away from the flash (mean area of 4.52, t(11) = 4.52, p < 0.001, pH0|D < 0.01, none of the other delays produced a detectable illusion, -0.31, -0.55, and -0.69, all t(11) < 1.23, p > 0.2, all pH0|D > 0.61; a one-way anova on these last 3 values results in the same conclusion, and even the test for a linear trend does not approach significance F(1,11) = 0.513, p = 0.481, pH0|D = 0.72). So while the shorter SOA is much less than suggested, we did obtain _flashILM at the shorter interval and there was no evidence of a rebound illusion emerging at longer intervals where IRM was detected about 80% of the time in the Hsieh et al (2005) study.

This leaves us with whether or not to further explore IRM as a possible 2^nd illusion co-occurring in E2. While a more complex explanation, it has been shown that multiple illusions can co-occur and contribute independently to the variance of the area measure (see Hamm, 2017 and Han & Hamm, 2018 for cases where this has been demonstrated). However, this discussion would also require us to speculate on whether or not IRM does actually occur in the current display protocol, and if IRM is a form of TAM or PGM, then the display conditions do not appear to be such that it necessarily should, and the above study does not suggest that it is. While the sample size is smaller than the other studies, it is equal to or larger than those in all but one of the experiments in the Hsieh et al (2005) article (12, 18, 9, 10, and 7, in E1 through 5, respectively). Also, the cuing task of experiment 3, also show no costs + benefits at the 2 longer post-illusion SOAs, rather than IOR at either of these. This is the same finding as in Hamm & Klein (2002) where we suggested that attention redistributes over the display rather than shifts to the terminal end, which we now point the reader to in the introduction:

“Hamm and Klein [1] found no evidence for such a complete shift but rather found an expansion such that attention was redistributed over the display rather than focused at one end. A similar finding is reported in experiment three by Hsieh, et al. [21].”

In addition, we have mentioned the smaller study, and those findings, in footnote 2, where we also suggest that as a result we do not believe that IRM is co-occurring. This, we feel, will avoid the necessity of an in depth discussion on the protocols involved, whether or not IRM can be measured by ILM_area (that’s not been shown before, so it could be IRM “vanishes” when real motion gets involved or that IRM is not measureable by the cancelation procedure – which would suggest that it’s neither _flashILM, TAM, nor PGM, of course, as all of those can be quantified this way; to be clear I doubt IRM would, in fact, not be measurable with this procedure but as it has not yet been demonstrated the possibility is there) and a number of other such issues, only to end up concluding that IRM does not appear to be involved (given the lack of evidence that it is emerging at the longer intervals) as that starts to feel a bit like a strawman set up.

As in the longer note above, and a point I wish to ensure is clear, we agree with the reviewer that IRM is a hypothesis that really deserves a few dedicated studies as there are a number of potentially fascinating directions in which such an investigation could lead. We thank the reviewer for this suggestion, and hope to be able to provide a more in depth look at this at some point in the not too distant future. We also recognize that our shorter interval in the footnoted study is much shorter than the one the reviewer is suggesting – in that study we just wanted to ensure that _flashILM occurred normally if there wasn’t time for attention to redistribute as the thinking is that redistribution probably occurs subsequent to the illusory motion, so the quick double change should not allow for this. Titrating out the time course between delay 1, where the double change produces an area measure similar to E1 and E2 (single change), and delay 2 where single changes produce ILM and the double change conditions do not detect any area of separation between the curves would provide finer details, but in the end, we have a starting condition that shows separation that becomes undetectable, and at longer intervals does not show evidence for a reversal emerging (even though IRM is robust at the two longer intervals). Since E1 shows _flashILM decreases, if IRM were occurring at the longer intervals in this case it should emerge.

I found two small issues/typos:

- Line 28 should say "two objects"

thank you - corrected

- On line 140 describing the process elimination of trials, it seems redundant to say that trials with RT > 4000 ms were excluded, when it was already mentioned that trials > 2000 ms were eliminated.

Yes, but this is reported to differentiate responses that were made, but too slowly to be considered representative of the behaviour in question and trials where no response was made before the trial simply terminated.

- In line 228 it should say 16.7 ms, not 66.7 ms

I believe this should say line 230? Where it’s been corrected (and the SOAs have been corrected to ISIs).

Reviewer 2 Report

The experiments are well conceived and address an interesting hypothesis and the general theme of the paper seems to fit with the journal. My recommendation is to accept pending revisions. Those would include careful editing for conciseness and clarity, some numerical support for important null results, and perhaps a large change in the discussion. Some notes on these follow. I would like all of these addressed in a revision.

Notes:

Please tighten up the abstract. For example, the following could be more concise, "The current study examines ILM in order to examine these predictions. The current study also demonstrates that ILM can be induced in an existing object display in the form of a “direction of painting” illusion."

l.160-163 There's no description of the plan for these contrasts and justification. This is similarly described as a planned analysis throughout so the general plan needs to be laid out prior to any of these analyses.

In general there's a slightly odd mix of parametric and non-parametric modelling. The data on the left side of Figure 2 has simple parametric solutions. And the right column would be Cauchy(?) or two exponentials. Anyway, they look very much like solvable problems so I'm not sure why parametric approaches aren't taken here like they are elsewhere. And let's not be too tied to the brief history of ILMarea calculation as a justification. The mix of approaches without any consideration for the change in meaning of the analyses is a bit untoward and I think should be accompanied by some comment at least.

On a similar note, I'm wondering if the log of the SOAs results in linear effects in Fig. 3 (just put on log graph). It's very common for SOA effects within a reasonable range to have a log relationship.

The authors tend to say that motion is cancelled. This is unlikely to be strictly the case. The responses participants can give are only left and right and there is no option to say the line came on all at once or for any other kind of motion. Further, actual percepts of a simultaneous lines are rare in these experiments and often times there is a very weak perception of motion left, right, or colliding in the middle somewhere. It would be much more accurate to say that consistent direction of motion is reduced (maybe even cancelled works there if the consistent is included). Then they're leaving open the very highly likely case that some motion of some kind is perceived nearly all of the time. And, it avoids strawman papers later that start, "So and so said line motion is cancelled, eliminated, completely removed from perception..."

l.254+ There's unnecessary redundancy here given apparatus and stimuli are identical to what is written for E1 and the procedure is substantially the same. Please shorten and also add in the differences in stimuli.

While the reference to Han et al. is handy, remind people in this manuscript that ILMarea can be negative and is always calculated relative to flash locations and not necessarily a hypothesized line direction. This is especially important given the second line could be hypothesized to reverse.

l. 361+ This is just wrong. The authors are concluding no differences in experiment based entirely on null results of the tests. This is essentially concluding no differences based on nothing since the test can't be used to decide that the null is true. There is all kinds of information here that could assist in concluding no difference. At lease make some qualitative statements about similar patterns of results for starters. Boost those by showing confidence intervals that (hopefully) suggest that if there are between experiment effects they are not very large. But this whole section needs to be excised as it stands because none of it does what it's purported to do at the opening. If there must be a test, there are tests that allow one to conclude no difference but I'm not a fan of them.

And to follow, on a related note, the claims of effects going away, being eliminated, or gone, really need to be tempered into statements of reduced effects, or no longer detectable effects. Reported confidence intervals could at least support arguments that effects are either 0 or inconsequential (if they can support that argument). But as it is we don't know if the range of likely mean ILMarea values (for example) includes meaningful amounts that make claims of "no effect" troublesome. In short, it's critical to focus on actual estimates wherever one makes a claim of no effect and, since the estimated mean must be incorrect with continuous data, a plausible range of estimates needs to be used to support arguments of no effect.

This also brings me to unnecessary tests. For example the tests against 0 starting at l.318 are superfluous and used to bolster an argument in an unwarranted way. If it's already shown that the conditions differ then minimally, given this pattern of data, the early condition must differ from 0 and a test is redundant. Further, the fact that the double condition does not differ from 0 is not evidence that it is 0. This requires a better argument that, at the very least, includes arguing that the plausible effects there could be aren't really meaningful or consequential values. (As an aside, such discussion would eventually result in the authors needing to make qualitative statements about what large or small ILM area values are, which would be a good thing for the primary users of these measures to write.)

l.526 forth fourth

Fig. 8B, perhaps the y-axis label could be effect in ms? Also, perhaps these signal strengths should be on the same scale.

Fig. 9 and 10 x-axis labels are a mess. Higher quality figures overall are needed here.

The whole speculation on consciousness minimally needs one more thing, a speculation on alternative models that's a bit more in depth. I'm not sure about how this special issue is being conceived by the editors so I'm not going to suggest it be removed. I'd like to see it expanded with some assessment of the fragility of the proposal. What if something as simple as failing the assumption of equal variance breaks it? I'm not saying it's bad but it seems like a germ of an idea that perhaps deserves a deeper exposition in it's own paper.

Author Response

I'm a bit concerned about trial exclusion. This is not a new paradigm to this lab so they should know they don't want trials with greater than 2s RTs beforehand. Therefore, it seems odd that 4s was permitted. In addition, while I don't disagree that an RT in this paradigm less than 200 ms is probably not a considered response to the stimulation it should be stated for the record the reason for this. The same is true for 2s being considered too long. One wonders why not 1900 ms or 1950? And in the future might I suggest making the planned cutoff for long RTs the experiment cutoff for RTs.

- the decision for the cut offs are given (<200 are considered anticipations and >2000 are considered distractions). While it does seem “odd” to allow the trial to continue to 4000ms, only to discard responses later that were longer than 2000ms, it is not entirely without reason. When the trials terminate at 2000, this increases the pacing at a time when the participant appears to be distracted – this could be anything, shifting posture, not focusing on the task momentarily (mind wandering), freezing with regards to responding, etc. – those sort of random events that occur during data collection in any experiment of human behaviour involving multiple trials like this - and in which case the terminated trial does not get resolved by a participant’s response. While we’ve not personally investigated whether this lack of resolution due to lack of responding actually produces an effect, which is a question of interest well beyond this particular area, this 2-stage maximum (one during the procedure, and one during the analysis) has been adopted on the basis of it would minimize any potential flow on effects of non-responding trials should there actually be any, and if there are no such effects, then no harm is done (these trials where a response is executed between 2 and 4 seconds would have been dropped anyway). It has been shown that trials following errorneous responses often result in a slowing of the response to the next trial, so it is not entirely unreasonable to suspect that there could be some sort of flow on following “failure to respond” as well. We’ve not included a description of this reasoning, in part because we have as yet to investigate it and so it simply remains a protocol that we follow that at best removes further contamination, and at worst just looks odd. While the exact cut off values were simply chosen as values that are reasonable cutoff values in Han et al (2016), they are now employed without adjustments in all studies from this lab. They are not, for example, chosen for each individual study through any sort of trial and error (i.e. p hacking). There are many, often arbitrary to some extent, decisions concerning methodology that one makes during the selection of experimental parameters and analysis, referred to as “researcher degrees of freedom” by Simmons et al (2011). Because one of the arguments we have been making throughout a series of publications is that the ILM field appears to have no standard approach, we have chosen to standardize as many of these as possible for studies employing the cancelation approach. We use the same “double cut off procedure and cut off values”, we quantify the illusion using the same metric (what we call ILM_area for convenience, noting that it is also what one directly observes from looking at the two group mean based percept curves – the large gap that appears between them, and so forth). We accept some of these decisions could have been made differently (1900 or 1950 upper cut off as the reviewer suggests above, but we see no reason why those would be better, and note that no reason is offered that they are better only they are a different semi-arbitrary value). However, once those decisions were made the first time (Han et al, 2016), we have held to them afterwards to avoid any sense of “picking and choosing” values to make things work. This does allow for researchers interested in investigating the influence of choosing different values, and systematically conducting methodological impact studies, but those questions are not the current focus of this set of studies. Rather, by holding to a common methodological set of decisions to the extent possible, we are limiting the sources of variations between the studies in the literature.

The term planned contrasts is a standard statistical analysis term that means that prior to the collection of the data we intended (we planned) to do a trend analysis. The contrasts for trend analysis are set values (i.e. over 4 equally spaced conditions, as with the delay conditions for E1, the linear trend contrast is -3 -1 1 3, and the quadratic is 1 -1 -1 1 (or any set of values in those ratios, so the linear could be -0.75 -0.25 0.25 and 0.75, for example). In other words, the presentation as is lays out the “general plan” of the analysis – it was decided upon a priori, and we used trend analysis contrasts, which are standard contrast vectors available in tables of statistics text books and implemented in most (and probably all) statistical packages that do ANOVA.

Again, we did look at fitting functions to the percept functions and have examined other potential measures in Han et al, 2016. In the end, we concluded and recommended, the area measure as the most direct quantification of the observable effect of interest (the gap between the percept score functions), that does not require the introduction of even more “researcher degrees of freedom” (i.e. how good a fit is required? What function will you use as the model? And so forth). In addition, if one fits a psychometric function to the individual participant data, say to obtain the PSE and analyse those values, the mean of the participant PSE values is not the same value as the PSE from the group mean percept curves. This means, the PSE that one sees in figures depicting the percept score group mean data will not coincide with the reported means of the analyses. With the area measure, however, the group mean of the individual areas is the same as the area of the group mean scores, so the reported means correspond to the area visible in the data figures. While this issue does not mean, nor should it be taken to mean, that we are suggesting analysing the PSE as a primary measure should never be undertaken should it be more theoretically related to a specific research question, for the current investigation the area measure is sufficient to answer the questions of interests and given the correspondence between the reported estimates of the population values and what can be seen in the figures presenting the percept scores, this is another pragmatic factor in the initial preference for quantifying the illusions using this measure.

Regardless, once the recommendation for the area measure was made, we have stuck with it to ensure maximal consistency from study to study, in part to avoid any suggestion that such decisions were made post-hoc (again, to avoid questions of p-hacking type behaviour – the random switching, for example, between reporting means, means with outliers rejected, medians, etc, is often a red flag to me suggesting that all are examined, and the “best” outcome is being reported. If this lab switched from area measure, to PSE, to the consistency measure, from study to study, the same red flag should be raised by readers). Again, there is always scope for investigation into what quantities could be examined, but every time a new study is introduced that arbitrarily quantifies a phenomenon with a different measure, there is always the problem/concern about whether or not the new measure is measuring the same underlying “thing” as the studies from the literature attempting to be tested. By utilizing a common metric over multiple studies, there cannot be concern about whether or not the current quantity and the published quantity are related. While the area measurement was introduced in Crawford et al. (2010), we have used it in 5 other publications since 2016, all employing the standard protocols as used here. This may be a brief history, but it is to our knowledge, the most consistent series of studies of the various forms of ILM from a methodological viewpoint.

And the decision times, as we’ve presented, are described as distance decay functions away from the PSE.

Yes, actually, thanks for this suggestion. When the SOA values are converted to ln(SOA), both ILM_area and the dtce produce linear functions (with r² > 0.99 in both cases). We report this, along with the corresponding regression equations, in the text. We continue to report the trend analysis based upon the non transformed delay conditions as one of the requirements of trend analysis is that the values be equally spaced along the x-axis dimension. The chosen SOAs differ in equal steps of 150 ms, and so qualifies, but the natural log of these values, of course, do not. However, knowing this, choosing SOAs that are equally spaced on a log scale is certainly going to be high on the list of future design choices.

- The above seems to be suggesting that “cancel” can only mean a complete cancelation (reduced to exactly zero), when any reduction qualifies as cancelling some of the signal. We’ve rephrased sections with this in mind along the lines of saying things like “cancelling to various degrees” etc to ensure that our intention is conveyed.

- The inclusion of a 3^rd “no motion” response has been explored (see Han, Zhu, Corballis, & Hamm, 2016; in short, this shows what you would expect if the motion signal is reducing towards zero – its use peaks around the PSE, but otherwise adds nothing to the interpretations and, in fact, may have a slight negative impact upon the ease of interpretation of the decision time data as a 3AFC paradigm involves two response boundaries – one between deciding left or none, and another between right and none. While there are interesting aspects to explore with regards to the position and number of response boundaries, these are quite different lines of investigation. The 2AFC paradigm is simpler and leads to the same explanation, and the resulting percept scores are identical to what one obtains with the 3AFC paradigm (where no motion is scored as 0). Moreover, we have discussed in a few articles in the past (starting with Crawford et al, 2010), why alternatives such as “sometimes one sees the real motion to the right, and sometimes one sees the illusion to the left, and the PSE is just when those tendencies are equal; but that explanation fails because doesn’t fit the decision time patterns – if at the motion PSE one is seeing either illusory motion or real motion on every trial, then the decision time would be the average of the decision time to the illusion (i.e. flash and no real motion trials) and the real motion with no illusion (the no flash trials), but as one approaches the PSE the decision times become slower than both of those, and tends to reflect the decision time of the no real motion and no flash trials. We’ve reiterated that in the discussion here as well.

Further, actual percepts of a simultaneous lines are rare in these experiments and often times there is a very weak perception of motion left, right, or colliding in the middle somewhere. It would be much more accurate to say that consistent direction of motion is reduced (maybe even cancelled works there if the consistent is included). Then they're leaving open the very highly likely case that some motion of some kind is perceived nearly all of the time. And, it avoids strawman papers later that start, "So and so said line motion is cancelled, eliminated, completely removed from perception..."

Ah, yes, the motion signals we are discussing as being “cancelled” are that which is generated either by the flash (the illusory motion signal) and that generated by the real motion, and not “all forms of possible motion”, such as that which arises when there is no flash and no real motion (gamma motion and/or polarized gamma motion, for example). Also, it must be remembered, that the estimates derived over a number of trials are estimates of a general tendency to see motion in a given direction. It’s possible that a motion signal, even a directional one, is perceived each and every time, as the above suggests, but the decision time data indicate that even if so, that motion signal is weaker (results in slower decisions) than either the illusion (flash and no motion trials) or real motion (no flash but motion trials) and also it is random with respect to direction (so the percept scores average towards zero). The relative infrequent use of a “no motion” response (Han et al, 2016) when it is available even on no flash and no real motion trials, suggests that there is a tendency to generally report motion. That “noise” motion, may continue to exist, but the directional signal (generated by the illusion and/or real motion) is what is being cancelled/reduced.

To make this unambiguous, we have included the following in the opening paragraph of the discussion of E1:
“Note, at the PSE on a trial by trial basis, there may be weak signals of randomly left or right motion reflecting typical variation, or there may be other non-directional forms of motion, such as PGM from both locations creating “inward motion”, or an outward expansion from the centre of mass (gamma motion), but neither of these motion signals lead to a consistent response choice and so are not under investigation. For clarity, reduction of motion signals here should be taken to mean a reduction of the directional motion signals generated by the illusion and by the real motion present in the stimulus and not to mean the absolute absence of any sense of jitter or multidirectional motions.
“

Indeed, thank you for pointing this out. We have edited the presentation in E2 along the above. We’ve left the presentation of the stimuli as there were slight differences in the display dimensions; otherwise we have reduced the presentations to highlight only how E2 differs from E1.

We have included in the introduction the following “A reversal of the illusion is indicated when ILM_area is negative [11,12], which occurs because ILM_area is always calculated as the area under the percept curve following a left inducer minus the area under the percept curve following a right inducer.” While we’ve not gone into a great deal about the specifics of why this is the case, we have directed readers to articles where they can find out more specific details.

There is as of yet no consensus on how best to deal with the logical complications that arise from the fact that theoretical interpretations involve the making of definite statements (statements that are clear and potentially falsifiable (Popper, 1959; 1963), such as an effect exists or an effect does not exist. The evidence we gather in our attempts to falsify a hypothesis is based upon statistical analysis of data, and statistical analyses (both NHST and Bayesian, and all other approaches) provides only probabilistic evidence, which is never definitive. One always risks being wrong, either stating that an effects exists despite the fact it always has some possibility of the evidence arising through sampling error, or stating that no effect exists despite there always being the possibility of a failure to detect it. Because statistical analysis provides a probabilistic evaluation, one can always throw up the argument that a smaller effect than one has the power to detect exists, just as one can always throw up the argument that a given finding was a false positive. In other words, while all argue strongly that the null is never proven it is also by the same argument never disproven by probabilistic evidence; there always remains, no matter how vanishingly small, the possibility that a “demonstrated effect” is a false positive. Either we abandon all hope of furthering our understanding, or we accept the fact that sometimes our interpretations will be wrong. Being willing to risk being wrong enables one to present an interpretation as a definite statement that is open to falsification when the same rules of interpretation are applied. If we “guessed right”, those attempts to falsify will themselves tend to fail (tend because of course, we might have guessed right and the falsification attempt ends up getting the wrong answer from their probabilistic evidence).

While the reviewer may, or may not, have other preferences, our approach to this dilemma has been to utilize both NHST and Bayesian approaches to statistical analysis and interpretation. Confidence intervals, and other such things, do not get around the problem of having to make specific theoretical claims about effects being true or not based upon evidence evaluation that is always only probabilistic because they are simply another way to represent the mean and variation from NHST (and they are almost universally misunderstood; particularly within-participant versions). The use of both NHST and Bayesian type assessment allows for one to both assess the accuracy of a specific statement (the null, which specifically says there will be no difference in measurement other than what can be accounted for by measurement variation) and Bayesian provides an assessment of confidence to guide the surety of one’s presentation.

We therefore have included the following section at the end of the introduction to lay out our approach to the statistical analysis and interpretations that it affords, noting the use and purpose of both NHST and Bayesian type analysis.

“ These hypotheses will be examined using both null hypothesis significance testing (NHST), which is an assessment of the accuracy of the predictions derived from the null hypothesis. The null is deemed to be inaccurate when the observed data is deemed improbable to occur when one starts with the assertion the null is true. The assessment of improbability is reflected in the standard p value. Failure to reject the null, it must be remembered, is not a claim that the null is true, only that any differences between the observed data and the null prediction could reasonably have arisen due simply to measurement variability of a common population mean. In other words, the null could be false despite non-significance and there could be additional effects, but if that is the case then those effects are presumed to be of insufficient magnitude to be detected under the current conditions. Therefore their existence can only be hypothetical speculations, the existence of which are currently unsupported by the data. Increasing sample sizes is one way to reduce the range of hypothetical-non-detectable effects, as this increases statistical power, but the range can never be reduced to zero. The flip side is also the case, in that there is a known probability of rejecting the null hypothesis in error, and so significance is never a guarantee that a proposed effect is true, only that the data is unlikely to have been obtained if there is, in fact, no effect at all. We will discuss effects as having been demonstrated should the null hypothesis be rejected, but when the null has not been rejected, we will present these as situations where any possible differences have fallen below detectable levels.

Baysian probabilities can be determined from analysis of variance and correlations [23] or from t-tests [24]. The Bayesian probability, denoted as pH0|D, is an assessment of the strength of the evidence for or against the null hypothesis, and this can be converted back to the Bayes Factor (BF) odds ratio simply by BF = pH0|D/(1-pH0|D). When used in combination, rather than as competing mutually exclusive methods, these two approaches provide clearer guidance on the theoretical interpretations of experimental evidence. While NHST p values tend to be described as either significant or non-significant, with some occasionally using phrases such as “marginally significant” (which, really should be phrased “marginally non-significant” if it is to be used at all), Bayesian pH0|D values tend to be described using more graded language. We will employ the descriptions suggested by Raftery [25], with the addition of the equivocal description for the range (0.475-0.525) [9,13] if necessary. The Bayesian analysis may be viewed as guidance towards future investigations for hypothetically possible effects that may exist but are below detectable levels.”

- We have also demonstrated this in practice by adding the following statement at the end of the between experiment comparison “In short, there was no evidence to suggest there was any substantial influence of the changes in protocol with regards to either measure.”, noting that this sentence makes it clear we are not claiming to have proved no effect (we are not “proving the null”, that is impossible), only that any effect, if present, is not substantial enough to warrant calling this a failed replication. Also, given the recent concerns about how replicable effects are in psychology, we think it is important to present this analysis to demonstrate that not only were common patterns of differences found, but that the differences in reported mean values and how they changed between delays may reflect nothing more than the naturally occurring levels of variation around a common population mean.

- The theoretically most important finding in this between experiment analysis would be the interaction between delay 1 and 2 (16.7 & 166.7) and experiment. NHST indicated that predicting no interaction (the null) does not result in inaccurate predictions. The Bayesian probabilities suggests the null hypothesis is twice as likely to be true as a non-null hypothesis. While this is considered only weak evidence in favour of the null according to Raftery, it is sufficient for referring to this as a successful replication of the values. With regards to the primary questions of investigation, none are contingent upon a null interaction being or not being true, but given the lack of evidence for an interaction it suggests the measures are relatively stable. Of the necessary effects, like the reduction in _flashILM with SOA, the null predictions (no reduction) are shown to be inaccurate and the evidence is very much in favour of these being true, despite the fact that there is always some possibility they are false positives – again, not only is the null unprovable, so to are actual effects, they also have a non-zero probability of being false positives. Error bars, or CI₉₅ bars, do not overcome any of this as they are simply graphical representations of means and variation rather than numerical calculations. The combination of NHST and Bayesian analyses, however, provide a measure of accuracy and evidence weight, which we think is probably the best we can do. Again, we recognize no consensus exists on this fundamentally important issue other than something must be done, and this combined NHST and Bayesian is our approach. We accept that the reviewer may be of a differing opinion.

- Finally, given that this between experiment analysis reproduces the difference between delay 1 and 2, and the lack of any firm evidence to suggest that this difference varies between experiments, we fell that presenting the statistical analysis is far preferable to simply giving a qualitative description of similar patterns (which would just be to say something like both experiments show a significant decline in the measures between delay 1 and 2 – in contrast we are checking to see if there’s any evidence to suggest the “rate of decline” differs between delay 1 and 2, and no evidence was found to indicate that. Without testing this interaction a qualitative description as per above would in effect be implying there isn’t one, which strikes me as far worse than reporting we actually tested for one and didn’t find it! Again, as described in the first section, what we are doing is testing to see if describing E2 as a successful replication of E1 could be considered an inaccurate description, and the results indicate that description is not inaccurate – but to reiterate nothing ever indicates any finding is absolutely true. That is the risk we all have to take whenever we move from the probabilistic outcomes of a statistical analysis to the presentation of testable and falsifiable theoretical statements. And if someone were to run a much larger study replicating what we’ve done here, and were to find that small effects exist where we have not detected them, none of the central claims being made here would change, and a fine tuning of tangential aspects would result – which is how scientific research is able to derive more and more precise theories.

- We appreciate the important issue that the reviewer is highlighting here, and it is one that surrounds much of the discussions surrounding replication concerns. Interestingly, those concerns tend to focus on the inability to reproduce reported significant findings, highlighting the above point about false positives. However, for many of the reasons alluded to above, we think the presentation of this analysis between the experiments, and referring to it as replication analysis, is completely appropriate.

- We have, as per below, rephrased these statements along the lines of being no longer detected (thanks for suggesting that phrasing, we agree that it’s a better description for non-significant findings, and have incorporated that in our description of the analysis approach at the end of the introduction as well as elsewhere). We have also pointed out in the introduction that a reduction in the illusion (either as SOA increases, or in the double-change condition, is the critical prediction, though we point out that based upon Hamm & Klien’s finding, where cuing effects did not significantly differ at either end of the ILM bar, that it is possible that the double change condition will render the illusion to non-detectable levels.

- The Bayesian values for the t-tests are now included, which weights the strength of the evidence in support or against the null hypothesis, using phrasing from Raftery (1995) with regards to the classification of “weak, positive, etc”. We’ve chosen to present Bayesian pH0|D values rather than confidence intervals because the latter are almost universally misinterpreted, particularly within-participant design versions, which these would entail. The above tests are necessary with regards to discussing whether or not ILM was detectable at each delay. While it was not essential to any of the hypotheses that ILM become be completely eliminated (meaning non-detected), only that it be reduced, it is still important to be able to discuss if that reduction was such that ILM was detected or not, even if for no other reason than to inform future research that might involve testing for ILM at longer intervals. As per the above suggestion, we have rephrased such terms as “eliminated” along the lines of detectable. We have also included in the discussion references to how the null predictions were considered accurate and the general weight of the evidence is in favour of the null, to emphasise the information gleaned from the two analyses approach. The critical prediction derived from Hamm & Klein (2002) is that the initial instance of _flashILM should spread attention, and therefore reduce any subsequent instance of ILM, and the findings are strongly in line with these predictions. We note here, for example, that in the double change condition for the area measure that the weight of the evidence favours the null at pH0|D = 0.60, or favouring null to the alternative at a ratio of only 1.5:1, which we agree is best described as “non-detectable” as this odds ratio is certainly not definitive. Importantly, it is not necessary to any of the arguments that the illusion be absolutely non-existent. The important prediction is that the initial ILM is thought to redistribute the gradient to be more evenly spread out, and so if _flashILM is related to this gradient, as the attentional explanation states it is, then ILM should reduce. The data strongly shows that it does.

- For example of our rephrasing, in the open sentence of the 2^nd paragraph of the discussion, where we state “Critically, during the double change condition of Experiment 2 there was no longer any evidence for _flashILM at the late interval.” We have rephrased to “Critically, during the double change condition of Experiment 2 the null prediction was not inaccurate and the evidence was weakly (ILM_area) or positively (decision time congruity effect) in favour of the null at the late interval, indicating _flashILM has been reduced to non-detectable levels.”, and at this close of this paragraph we have rephrased “…such that the second colour change no longer occurs along the attentional gradient.” to read “…such that the second colour change occurs along a much less asymmetrically distributed attentional gradient.”

- And later, the statement “The findings from Experiment 2 showed that _flashILM did not occur when the target colour change was preceded by a distracter colour change. ” has been changed to read “The findings from Experiment 2 showed that _flashILM fell below detectable levels when the target colour change was preceded by a distracter colour change.”

Similarly, the passage “According to Hamm and Klein [1]’s spreading of attention theory, following the presentation of the distractor colour, exogenous attention should no longer be asymmetrically distributed towards one end of the display as the _flashILM generated by the distractor colour would redistribute attention. Therefore at the late interval when the target colour change occurs, the colour change no longer occurs along a gradient of attention.”
has been rephrased to:
“According to Hamm and Klein [1]’s spreading of attention theory, following the presentation of the distractor colour, exogenous attention should no longer be as strongly asymmetrically distributed towards one end of the bar as the flashILM generated by the distractor colour would redistribute attention more evenly. Therefore, at the late interval when the target colour change occurs, the colour change occurs along a much reduced gradient of attention.”

etc

l.526 forth fourth – thank you, corrected

Fig. 8B, perhaps the y-axis label could be effect in ms? Also, perhaps these signal strengths should be on the same scale. – Thank you, yes, fixed.

Fig. 9 and 10 x-axis labels are a mess. Higher quality figures overall are needed here. Fixed

- While we agree fully that comparison with other models would be interesting, this would greatly expand the manuscript and runs into the problem of deciding what alternative model to use as a comparison (Ratcliff’s diffusion model comes to mind, but there are, of course, others). Moreover, as indicated, the ideas presented are indeed, entirely speculative, intended at the time to be an exploration of how one might go about testing a change of state model based upon distributions of response times that would be hypothesized to be a mixture of both states. We have, however, added the following just after presenting the binned decision times (which show a smooth distance decay function) and percept scores (Figure 8) “The smooth progression of both measures could be taken to reflect influences of increasing signal strengths with no evidence indicating any change in performance as would be expected if there were a change in state at some point between the weakest and strongest signals. With awareness likely to be absent for the weakest signals but present for the strongest, this pattern could be indicative of a “glow point” type of threshold with regards to awareness.” After which we point out the focus is to determine if these smooth functions can be accounted for by a change of state model, and end with “This is the focus of the following discussion, it is a proof of concept exploration as to whether or not a change in state model can account for the observed data pattern.” To ensure it is clear that we are examining the feasibility of a change of state model, not comparing it to a specific, or general, “non-change of state” model that one might presume is indicated by the initial data pattern.

- We are also very clear to point out, the starting assumption that the distributions do not change shape is simply a starting premise upon which the rest of the analysis is based, and we reiterate that point when the analysis is described to highlight where it is applied to the analysis. And yes, if this starting assumption is wildly incorrect then one would reject the results it produces, it is an aspect of what we are proposing that is testable. And because this assumption is a testable hypothesis effectively we’re willing to present this idea in a way so that it can be empirically challenged and tested – at the moment that is well beyond this initial presentation of the idea. However, if this assumption is incorrect, it is a remarkable coincidence that the analysis that follows from it produces an outcome that is entirely what would be expected from all the noncontroversial predictions and premises that are the basis of a two state model so we suggest that the outcomes are the first chance for this idea to fail, and it has passed. We have included, however, a bit of a theoretical discussion on why such an assumption could be considered plausible – in short, we suggest that the variation in decision times that would arise on a trial by trial basis if there were no variation in the starting state (i.e. if one could “reset a brain to be as nearly exactly as it were on the previous trial as one can do when applying heat to a 2^nd test wire) is minimal compared to the variation in the brain’s state that occurs when actually presenting stimuli to living participants. The distributions of response times, therefore, may primarily reflect this starting variation, which should be unrelated to the stimulus conditions. Therefore, when a larger signal is presented, the individual trial decision time will primarily reflect variation in initial brain states, with the mean shifting based upon the stimulus input, which has generally been controlled by the experimental set up. Therefore, the distribution shape will remain unchanged while the overall mean may shift. The difference in the shape of the distributions due to a change in state arises because once a change in state occurs, this will have an impact upon the underlying brain activity, but the same principle applies. There may be other hypotheses that could also postulate such constant distribution shapes, and we fully acknowledge that this assumption calls out for empirical testing to see if it stands up to scrutiny.

We have therefore included the following section at the point in the discussion where we introduce this assumption:

“The thinking behind this assumption is that the variation in trial by trial response times is primarily due to the moment by moment variation of brain activity into which the stimulus signal is introduced. Particularly fast decisions may reflect the chance occurrence of a stimulus being presented when all systems through which stimulus processing proceeds happen to align in a beneficial way, and particularly slow decision times may reflect the occurrence of when more systems are generally in a state to prolong processing. As a change of state could impact a wide range of brain activity, it is plausible to suggest that the different states will produce different distributions of response times.“

And feel that anything more than this would be too highly speculative at this point while at the same time providing sufficient detail to allow for empirical testing. Most importantly, at this point it is not so important to be able to explain why the response time distributions should not change shape but to acknowledge that this is a premise of the proposed approach to the analysis. And yes, if this premise is false, then the analysis that follows is unsafe, but rather than viewing this as a “fragile” aspect of the presentation we feel it is one of its very best strengths as it is very specific and testable – we are ensuring that we are making a falsifiable suggestion. Furthermore, we have pointed the reader back to all of the premises at points in the discussion where the data outcome correspond to the premises made concerning a change of state model. It is the constant distribution shape that is used in the analysis, and from that the other premises refer to expected outcomes in the data, and the data pattern is consistent with all of them.

- Research into consciousness is filled with theoretical discussions, few of which lead to testable or falsifiable statements. We believe that the presentation of this formulation of a change in state description of consciousness provides more than sufficient fodder for empirical work. We have gone to great lengths to ensure our presentation is not misconstrued as being anything more than a hypothesis with promise, and welcome, even encourage, its empirical investigation either on its own or in comparison with an alternative “glow point” model. We feel, however, that if we were to attempt to present an alternative, there would be only two possible outcomes. One, our alternative glow point fails to account for the data, OR the alternative also accounts for the data. If we presented the former, this could simply mean our alternative model inadequately describes a glow-point type of consciousness. If we found the latter, then all we end up with are two competing explanations that would require further testing to differentiate, requiring us to then design a test of these two situations, and so on. This goes well beyond both the intention of the current presentation is meant to accomplish, which is to demonstrate a “proof of concept” for a change in state model being able to account for a wide range of the behavioural data, including response time distribution shapes. With only one potentially controversial premise, which we acknowledge as such and believe is presented in such a way that allows for the falsification of this approach (and it’s worth will be reflected in how well it resists falsification attempts), provides a great deal of food for thought that will be of interest to those who research, or are interested in, consciousness.

Round 2

Reviewer 1 Report

The authors' response assuages my concerns about a possible nulling role of IRM.

Author Response

We thank the reviewer for their time and suggestions.

Article Menu

Cancelling Flash Illusory Line Motion by Cancelling the Attentional Gradient and a Consideration of Consciousness

Further Information

Guidelines

MDPI Initiatives

Follow MDPI