1. Introduction
Digitalization of tax administration promises efficiency and transparency, but success depends on widespread user take-up and sustained use [
1,
2,
3]. Governments are therefore expanding e-invoicing and electronic book-keeping; in Greece, the Independent Authority for Public Revenue (AADE) rolled out myDATA as national digital tax-reporting infrastructure. For accounting practice, such systems do not merely digitize communication with the tax authority: they reconfigure routine processes of invoice issuance, transaction recording, and periodic reporting, with downstream implications for reporting timeliness, error rates, and the evidentiary trail that supports audit and enforcement. Implementation is often constrained by information gaps, uneven readiness across firms, and operational strain around deadlines. Evidence from the UK’s Making Tax Digital suggests that early awareness can be limited—surveys found that many small firms initially did not know about new digital record-keeping obligations [
4,
5,
6,
7]. When knowledge and training lag behind rollout schedules, agencies face late rush-to-compliance dynamics and spikes in support demand, conditions under which misconfiguration and reporting errors become more likely [
4,
5,
6,
7,
8].
Public attention is therefore not just a by-product of implementation but a mechanism shaping adoption and compliance outcomes. Behavioral public administration and applied economics show that salience and timely communications can alter citizen responses and uptake, including via low-cost informational interventions [
9,
10,
11,
12]. Field evidence suggests that reminders and peer-use cues can substantially increase adoption among initially reluctant users [
13,
14,
15,
16,
17,
18]. However, showing that “announcements matter” is not sufficient: for practice and theory, the key questions are which rollout milestones shift attention, whether changes are abrupt versus gradual, and whether attention differs across parts of a digital compliance ecosystem [
18,
19,
20].
Digital trace data provides a practical way to observe these dynamics at scale. Google Trends has been widely used as a proxy for public awareness and issue salience [
20,
21], and search activity often responds strongly to pre-announced deadlines and policy shocks [
21,
22,
23,
24]. In the myDATA context, the goal is not merely to document spikes but to interpret attention as a measurable intermediate signal in the accounting workflow: increased searching plausibly reflects information acquisition and task initiation (e.g., registration, software choice, configuration, issuance rules, submission procedures) that precede—though do not guarantee—changes in actual reporting behavior and compliance quality. This perspective matters for accounting and auditing because attention peaks can coincide with periods of heightened process change and learning, when the risk of late reporting, corrections, or inconsistent record-keeping may increase.
A second substantive question is what users attend to when interest rises. myDATA is an ecosystem comprising the AADE portal, the “Timologio” application, third-party solutions, and broader e-invoicing compliance concepts. We therefore expect distinct “families” of search terms—platform/authority, application/tool, and ecosystem/compliance—to respond differently to the same milestone, revealing where information needs and friction concentrate [
25,
26].
This paper targets a gap at the intersection of digital tax infrastructure, accounting process change, and forecasting with digital traces. Prior e-government work largely studies adoption drivers and interventions, while time series research shows that Google Trends can track—and sometimes help predict—collective information demand. These strands are rarely integrated in a staged national rollout where bookkeeping routines, invoicing workflows, and compliance documentation are being reconfigured in real time.
We contribute a preregistered, fully reproducible event study of Greece’s myDATA/e-invoicing rollout that (i) estimates attention shifts around prespecified milestones using an interpretable step/ramp coding with HAC-robust inference and false-discovery control; (ii) compares responses across query families that map to distinct interfaces of the ecosystem (platform, app, and broader e-invoicing); and (iii) evaluates whether policy calendar features improve 1–3-month nowcasts relative to seasonality-only benchmarks, with bounded operational relevance for timing guidance and sizing support capacity. Crucially, search attention is treated as an intermediate signal of information frictions, not a policy success outcome: we do not observe filing behavior, platform usage, or audit findings. Instead, we use attention dynamics to indicate when and where implementation pressures are most likely to surface, motivating linkage to administrative metrics in future work.
2. Literature Review and Related Work
The most critical query and problem is: why strike search attention? When governments introduce technical compliance requirements, affected stakeholders often seek guidance immediately—via accountants, vendors, professional networks, and the web—making search activity a timely proxy for information demand [
14,
15]. Although searches are not compliance, they can function as an operational leading signal: spikes in queries about deadlines or procedures often precede surges in helpdesk contacts, onboarding activity, and “last-minute” scrambling that appear later in administrative data. Because attention is typically brief and volatile, monitoring searches offers a practical way to assess whether milestone communications are reaching the public during the narrow window when guidance is most likely to be absorbed [
23].
This attention perspective aligns with agenda-setting and issue-attention theories. Political communication research emphasizes that media and institutional cues shape what people attend to—even if they do not determine what people believe [
27,
28,
29,
30]. Yet salience rarely persists: Downs’ “issue-attention cycle” proposes that public interest rises sharply and then fades as novelty dissipates and costs of sustained engagement emerge [
31]. In digital government contexts, announcements, deadlines, and mandate phases can play a similar role to “news events,” briefly redirecting limited attention toward action-proximal questions. Our study adopts this lens by treating prespecified myDATA milestones (e.g., go-live, phased mandates, harmonization steps) as salient cues and testing whether they generate short-run bursts and medium-run shifts in information-seeking [
32].
A large cross-disciplinary literature supports the use of search data as an economic and behavioral indicator, particularly for short-horizon monitoring and prediction [
33,
34,
35]. Incorporating Google Trends has improved nowcasts of diverse outcomes, and classic work shows that informative search frequencies can enhance prediction of contemporaneous indicators beyond models using only lagged official data [
21]. Search activity can also lead behavior: web queries have been shown to forecast consumer demand in several settings, consistent with “revealed interest” preceding action [
24,
25,
26,
27,
28]. At the same time, well-known failures such as Google Flu Trends highlight the risks of over-interpretation: media amplification, shifting search behavior, and model instability can generate spurious signals or exaggerated effects [
13]. For this reason, our design emphasizes preregistration, prespecified event timing and functional forms, and triangulation across multiple query families to reduce sensitivity to idiosyncratic spikes [
36].
Our substantive context overlaps with the growing literature on VAT digitization and mandatory e-invoicing systems (e.g., Italy’s SdI, SAF-T implementations, the UK’s Making Tax Digital), which primarily evaluates compliance, revenue, and productivity effects [
18]. This work often documents positive fiscal impacts and improved record-keeping, alongside transition costs. However, it typically provides limited evidence on the pre-compliance stage—public awareness, information gaps, and communication dynamics—despite their importance for successful adoption. We contribute by focusing on this earlier mechanism, i.e., whether and when target populations exhibit measurable information demand around rollout milestones, as a complement to downstream outcome evaluations.
Methodologically, we draw on interrupted time series and event study approaches, where interventions are modeled as level shifts and/or changes in slope [
26]. This distinction parallels “pulse” versus “carryover” reasoning in applied settings: some events plausibly induce abrupt jumps in attention (e.g., go-live announcements), whereas others alter trajectories more gradually (e.g., harmonization steps or phased expansion). Our contribution is to apply this logic in a preregistered manner—coding the timing and shape of interventions ex ante rather than searching for breaks post hoc—thereby limiting researcher degrees of freedom and supporting interpretable estimates of medium-run impacts [
37].
A practical motivation is whether these event-conditioned signals add value for short-horizon planning. Forecasting research emphasizes that simple seasonal baselines are difficult to beat at 1–3-month horizons and that improvements are often modest unless structure is well specified [
7,
23]. We therefore prioritize interpretability: we compare a seasonal-naïve benchmark to transparent regression-based models that incorporate prespecified policy timing as step/ramp indicators, yielding directly communicable predictions in Google Trends units [
4]. Auxiliary automated methods are treated as robustness checks and are separated from the main analysis.
Overall, this study contributes to research at the intersection of public attention and digital tax infrastructure in four ways. First, it provides a preregistered event study of search attention around a national e-invoicing rollout (Greece’s myDATA) using a prespecified set of milestones. Second, it quantifies effects in interpretable units (SVI points) and contrasts patterns across platform, app, and ecosystem query families. Third, it evaluates whether incorporating policy calendars improves short-horizon nowcasts relative to a seasonal baseline, addressing an operational question relevant to support planning. Fourth, it emphasizes transparency and reproducibility by releasing code, queries, and analysis outputs to facilitate replication and extension.
Background: myDATA and E Invoicing in Greece
Greece’s myDATA (“my Digital Accounting & Tax Application”) is AADE’s digital tax-reporting infrastructure that operationalizes electronic bookkeeping (“e-books”) and supports e-invoicing. The reform aims to standardize and automate reporting, increase transparency, and reduce evasion by providing businesses and intermediaries (accountants, software providers) with a unified reporting interface. Because adoption unfolded through staged mandates and technical harmonization, information demand is expected to arrive in waves—rising around visible deadlines and mandate expansions and receding as workflows routinize.
We therefore preregister six rollout milestones that plausibly shift information demand and/or use:
- (1)
myDATA production go-live (2021-10) [
8];
- (2)
B2G Phase 1 (2023-09), initiating compulsory e-invoicing for central government bodies [
15];
- (3)
Central administration full coverage (2024-01), modeled as a step-change [
26];
- (4)
VAT–myDATA alignment (2024-01), modeled as a slope-change reflecting gradual workflow convergence [
25];
- (5)
B2G extension to the rest of the public sector (2024-06) [
25];
- (6)
EU authorization for a domestic B2B mandate (2025-03) [
16]. These are the public “beats” of implementation (press releases, circulars, deadlines) that should generate detectable shifts in information-seeking if they are salient.
To track attention, we use Google search interest and group queries into three families that map onto distinct points of interaction with the reform: platform terms (A; AADE/myDATA access points, e.g., aade, ααδε), application terms (C; invoicing app queries, e.g., timologio/τιμολόγιο), and ecosystem terms (D; broader e-invoicing topics/standards, e.g., the “Electronic invoicing” topic). This structure allows us to test not only whether attention moves at milestones but where it concentrates—on the official platform, the front-end invoicing tool, or the wider compliance ecosystem. Guided by this context, we ask the following:
RQ1: Do pre-specified myDATA- and e-invoicing-related events (such as announcements, deadlines, or system updates) cause discernible shifts in public attention as measured by Google search trends?
RQ2: Can an event-aware structural model (one that includes features for these communications or policy events) improve short-horizon nowcasts of public interest compared to baseline models that capture only regular seasonal patterns?
RQ3: Which families of search terms show the most significant movements in response to the events, for example, are people searching more about the official platform and its use, the companion app, or the broader ecosystem and compliance requirements?
Figure 1 summarizes the implementation timeline from 2016 to the latest month and marks the six preregistered milestones; the two January 2024 events are shown as distinct step versus slope interventions to avoid conflation and to provide a consistent reference for the results figures.
4. Methods
4.1. RQ1: Event-Study OLS
We estimate event-driven changes in monthly Google Trends attention using a transparent linear specification with calendar controls and preregistered policy dummies. For each outcome series yt (SVI on the native 0–100 scale) indexed by month t, we fit:
The centered linear trend captures secular drift; month fixed effects (February–December; January omitted) capture seasonality, and two one-month pulses absorb the abrupt COVID shock (4 March 2020). Policy milestones are encoded ex ante as step indicators and/or post-event ramps in months. The baseline excludes quarter_end to avoid redundancy with month fixed effects.
Primary estimation uses OLS with Newey–West HAC (6) standard errors (preregistered). For interpretation, we summarize each event’s impact at horizon
months via the linear combination:
and report HAC-robust 95% confidence intervals computed from the robust covariance of
. To address multiplicity within each outcome series, we apply Benjamini–Hochberg’s FDR adjustment to event-level
p-values. Our primary estimand is
because it provides an operationally meaningful medium-run window while remaining close to the policy period.
Serial dependence is assessed using Ljung–Box and ACF/PACF diagnostics; when residual autocorrelation is substantial, we report an AR (1) error variant (SARIMAX with identical exogenous regressors) as a robustness check in the
Appendix A. All top-line inferences are based on the preregistered OLS + HAC specification for transparency and comparability across series. Our preregistered confirmatory claims center on the five priority outcomes × six events. For these “main-claims” tests, we report both (i) BH–FDR within series across events and (ii) a pooled BH–FDR across the full priority family (5 × 6) to align inference with cross-series narrative comparisons. Results outside the priority family are treated as secondary/exploratory and are described without strong inferential language. January 2024 is encoded as two distinct interventions because they represent conceptually different mechanisms: a back-office coverage completion (step-only) versus a workflow harmonization process expected to accumulate gradually (slope-only). This split was preregistered to avoid post hoc tailoring.
4.2. RQ2: Nowcasting Design
We evaluate short-horizon predictive value using blocked rolling-origin cross-validation that mimics real-time deployment. Origins start in 2018-01 and advance in 6-month increments; at each origin, models are trained on up to the previous 48 months and forecast
months ahead [
4,
9,
16]. All features are deterministic functions of calendar time and prespecified policy dates, ensuring strict no-leakage: design matrices for
are constructed using only trend continuation, month indicators, COVID pulses, and event step/ramp rules fixed ex ante. We compare three transparent forecasters:
SNAIVE (12): .
OLS + events: the structural regression with trend, month fixed effects, COVID pulses, and prespecified event indicators.
OLS + events + AR (1): the same exogenous specification estimated with AR (1) errors (SARIMAX) to capture residual autocorrelation.
Forecast accuracy is summarized by MAE and RMSE (primary), with sMAPE/MASE reported for completeness; we also report percentage MAE improvement relative to SNAIVE (12). Statistical comparisons against SNAIVE (12) are conducted using Diebold–Mariano tests with absolute error loss (series × horizon), reported concisely.
Prediction intervals (80% and 95%) are constructed from out-of-sample residual dispersion (Gaussian bands) with a seasonal block bootstrap used as a robustness option to respect monthly dependence. Additional baselines, blends, and extended diagnostic plots are reported in the
Appendix A to preserve readability.
Auxiliary machine learning (ML) models are treated as exploratory robustness and are reported in the
Appendix A. Specifically, we test low-capacity global residual learners that predict
using deterministic calendar/event features (trend, month/quarter harmonics, event step/ramp indicators, COVID pulses) and then add predicted residuals back to SNAIVE (12). These models are intentionally regularized (shallow trees/small MLP with early stopping) to limit overfitting in short panels; we only highlight ML results when they match or exceed the best structural model consistently across rolling splits for a given (series, horizon). The structural models remain the default due to interpretability and replicability. Forecast value is evaluated series-by-series and horizon-by-horizon; we report instances where the event model underperforms SNAIVE (12) as failures, not exceptions. We quantify uncertainty in MAE differences using a paired bootstrap across forecast origins, and we report empirical PI coverage for nominal 80%/95% intervals.
4.3. RQ3: Which Families Move?
RQ3 summarizes heterogeneous attention responses across three query families—platform (A), app (C), and ecosystem (D)—using the RQ1 event study estimand. Family anchors are the five priority outcomes: platform , app , ecosystem . We classify “movement” for each (event, series) using the sign and BH-FDR significance of the primary medium-run estimand :
To distinguish abrupt versus gradual responses, we append “(S)” when either the step component or slope component is significant for BH-FDR even if is marginal, indicating the dominant driver (level vs. ramp). Family-level summaries aggregate these classifications across the relevant anchors (A2–A3, C1–C2, and D1).
As a family-agnostic summary, we construct a composite attention index as the mean of z-scored priority series (A2, A3, C1, C2, D1); PCA-1 is used as a robustness alternative. We re-estimate the same event study model on the composite and apply the same
and BH-FDR decision rules. Planned sensitivity checks mirror the preregistered robustness set: HAC (12), event lags (+1/+2), log-scale outcomes, STL-deseasoned outcomes, and placebo events shifted −24 months; where AR dependence is strong, the AR (1) error variant is reported as a stability check in the
Appendix A.
6. Discussion
This study examined whether prespecified milestones in Greece’s myDATA/e-invoicing rollout were associated with changes in public search attention (RQ1), whether encoding those milestones improves short-horizon nowcasts of attention (RQ2), and which query families respond most (RQ3). The results are best interpreted as evidence about salience and information-seeking—an intermediate signal of user attention and potential onboarding friction—rather than as evidence of compliance, reporting accuracy, or policy success. Consistent with issue-attention perspectives, “front-stage” milestones tend to coincide with abrupt, short-lived shifts in task-oriented searches, whereas harmonization and coverage changes are more often reflected in gradual ramps or drawdowns.
6.1. Do Events Move Attention?
Across the preregistered milestones, the strongest and most consistent responses appear in the application (“app”) family. In particular, timologio (C2) rises at go-live (Δ6 ≈ +38.4 SVI) and C1 rises at B2G Phase 1 (Δ6 ≈ +33.6), while both fall at VAT–myDATA alignment (C1 Δ6 ≈ −43.1; C2 Δ6 ≈ −34.3). The ecosystem topic Electronic invoicing (D1) shows large, opposite-signed medium-run shifts—an increase around VAT–myDATA alignment (Δ6 ≈ +46.4) and a decline after the B2G rest-of-public expansion (Δ6 ≈ −52.7). Platform terms are more heterogeneous and often weaker: ααδε (A3) increases at go-live (Δ6 ≈ +15.1) whereas aade (A2) declines after B2G rest-of-public (Δ6 ≈ −22.2), with other effects small or indistinguishable from zero after FDR adjustment. Central administration full (Jan 2024) shows no measurable shift, consistent with a back-office milestone lacking a user-facing call to action.
Two interpretation boundaries are important. First, we prioritize Δ6 in SVI points on the native 0–100 scale; this avoids the mechanical inflation that can arise when percentage changes are computed from low baselines. Percentage effects from log-scale robustness checks can appear extreme (e.g., ±300–500%) when pre-event SVI levels are near zero, even if the absolute change remains modest. Accordingly, we treat percentage changes as descriptive robustness and interpret them only alongside absolute SVI point shifts and baseline levels. Second, increased searching plausibly reflects information acquisition and task initiation (e.g., registration, software selection, configuration, issuance rules), but it does not establish uptake, compliance quality, or reporting accuracy.
Within these limits, the pattern of step-like jumps for app queries around launch-type milestones and slower ramps around harmonization is consistent with standard intervention logic (level versus slope responses) and with attention allocation arguments: salient announcements trigger short-lived, action-proximal “how-to” demand, while harmonization reshapes workflows and shifts attention more gradually toward ecosystem-level concerns [
9,
16,
23]. For operations, the practical implication is not that the policy “succeeded” but that certain milestones predictably coincide with concentrated information demand in specific parts of the ecosystem.
6.2. Do Event-Aware Models Forecast Better?
Encoding the prespecified policy calendar improves short-horizon forecasts relative to a seasonality-only benchmark in several series, with the clearest gains for platform terms at
and for app/ecosystem series at longer horizons. In rolling-origin validation, OLS + events reduces MAE versus SNAIVE (12) by roughly 40–50% at
for platform series (A2/A3), with more variable gains at
and horizon-dependent benefits for app and ecosystem outcomes. This pattern is consistent with a simple point: when variance is partly driven by interpretable, dated shocks (launches, phased mandates, harmonization), deterministic step/ramp indicators capture structure that seasonal repetition alone misses [
12,
20,
23,
28].
Horizon heterogeneity is informative rather than speculative. For example, timologio (C2) shows limited incremental value at
but clearer gains at
, consistent with the idea that medium-run ramps become predictable once month-to-month idiosyncrasy averages out. For D1, event terms add little at
but become more useful at
as slope components accumulate [
9,
16,
30]. Importantly, these are forecasting improvements in attention, not proof of behavioral change; their value is operational: planning the timing of guidance, helpdesk capacity, and vendor coordination around known milestones. The results also align with the cautionary lesson from search-based forecasting critiques: structured, theory-consistent features can help, but idiosyncratic spikes and evolving search behavior limit one-size-fits-all gains.
6.3. Which Families Move?
The family ordering—app (C) strongest, ecosystem (D) second, platform (A) weakest—provides a compact summary of where attention concentrates during staged rollouts. App terms are closest to immediate tasks (issuing invoices, onboarding), so they exhibit sharper step-type responses around salient milestones; ecosystem terms tend to reflect rule changes, standards, and workflow reconfiguration, so they are more often expressed as ramps; platform/brand terms aggregate heterogeneous intents and are less diagnostic except at headline moments. The composite index corroborates that these dynamics are not driven by a single series [
9,
16,
23].
Operationally, the implication is a sequencing logic rather than a success claim. Agencies can anticipate app-focused information demand around launch-type milestones (staffing and onboarding materials), while harmonization windows call for sustained ecosystem-oriented guidance (standards, procedures, vendor alignment). Conversely, purely administrative completions may remain low-salience and should not be expected to shift public attention without complementary communications.
Overall, the contribution is not that search attention equals compliance or performance but that a preregistered, transparent event-study design can (i) characterize heterogeneous attention responses across milestones and query families and (ii) yield modest but actionable improvements in short-horizon nowcasts of attention that support communication and support planning in digital tax rollouts [
12,
20,
23,
28].
Table 13 summarizes which conclusions are robust to the prespecified checks and where interpretation remains sensitive.
8. Conclusions, Limitations, and Future Directions
This study examined whether prespecified milestones in Greece’s myDATA/e-invoicing rollout were associated with shifts in public search attention (RQ1), whether encoding those dates improves short-horizon forecasts of attention (RQ2), and which query families respond most strongly (RQ3). Using preregistered step/ramp indicators on monthly Google Trends data (2016–present) with HAC-robust inference and BH–FDR adjustment, we find a consistent ordering: app queries respond most clearly around launch-type milestones, ecosystem attention shifts more gradually, and platform terms are smaller and less regular; the back-office “central administration full” milestone is near-neutral. Event-aware models also improve out-of-sample nowcasts relative to a seasonal-naïve benchmark for some series and horizons, with the clearest gains in selected short-horizon cases [
12,
20,
23,
28]. Overall, policy timing appears to structure information-seeking in measurable ways, while search attention remains an intermediate signal rather than a compliance outcome.
These findings should be interpreted with caution. Google Trends captures attention, not adoption, compliance, or audit outcomes. Large percentage changes can partly reflect low baselines, so we prioritize SVI point effects in interpretation. The design is observational, and time-varying confounds may still coincide with milestones despite controls for trend, seasonality, and COVID pulses. Placebo results are therefore treated as stress tests, and non-trivial placebo significance reinforces cautious, non-causal reading. In addition, GT scaling, stitching, and topic-versus-term differences may affect comparability, although we address these issues through validation and sensitivity checks.
Future research should link attention signals to administrative and behavioral outcomes such as helpdesk tickets, onboarding completion, active user counts, or e-invoice submissions to test whether search attention has measurable operational lead value. Richer designs could incorporate communication intensity and ecosystem activity, including media coverage, vendor releases, and professional association notices, in order to separate policy timing from concurrent narrative shocks. Higher-frequency data could also be used to examine anticipatory spikes and post-event decay, while subgroup analyses by region, industry, or user type could clarify who responds to which milestones [
12,
22,
23]. Finally, applying the same preregistered framework to other digital tax reforms such as Making Tax Digital, SAF-T, or national e-invoicing mandates would test portability and help build comparative evidence across common rollout archetypes.