Early Peak Badges from Wi-Fi Telemetry: A Field Feasibility Study of Lunchtime Crowd Management on a Smart Campus

Variskhanov, Anvar; Arreeras, Tosporn

doi:10.3390/urbansci10010029

Open AccessArticle

Early Peak Badges from Wi-Fi Telemetry: A Field Feasibility Study of Lunchtime Crowd Management on a Smart Campus

by

Anvar Variskhanov

¹

and

Tosporn Arreeras

^1,2,3,*

¹

School of Management, Mae Fah Luang University, Chiang Rai 57100, Thailand

²

Urban Mobility Lab, School of Management, Mae Fah Luang University, Chiang Rai 57100, Thailand

³

Center of Excellence in AI and Emerging Technologies, Mae Fah Luang University, Chiang Rai 57100, Thailand

^*

Author to whom correspondence should be addressed.

Urban Sci. 2026, 10(1), 29; https://doi.org/10.3390/urbansci10010029

Submission received: 28 November 2025 / Revised: 25 December 2025 / Accepted: 31 December 2025 / Published: 3 January 2026

Download

Browse Figures

Versions Notes

Abstract

Smart cities increasingly reuse existing Wi-Fi infrastructure to sense crowding, but many smart-campus tools still fail to support routine, day-to-day decisions. A short-horizon field feasibility study was conducted to prototype a low-maintenance, prefix-based early-warning rule that turns anonymized campus Wi-Fi access-point counts into an interpretable lunchtime crowd signal. Daily 7-min access-point profiles from five university canteens (11:00–14:00) were aggregated, winsorized, smoothed, and row-z-scored, then clustered into demand-shape typologies using k-means++. Two typologies were obtained (Early Peak and Late Shift), and a cosine-similarity atlas was frozen. At 11:28, the five-bin occupancy prefix was compared to typology centroids, and an Early Peak badge was issued when similarity to the Early Peak centroid exceeded a preset threshold. On held-out days, the Early Peak typology could be identified at 11:28 with coverage of 0.73 and agreement of 0.86 relative to end-of-day labels. In 20 matched canteen-weekday pairs, badge days were associated with a Hodges–Lehmann median reduction of 0.193 standard-deviation units in peak crowding (≈9% lower). Given the short (3-week) horizon and limited hold-out window, results are presented as feasibility evidence and motivate a larger controlled evaluation. Simple, interpretable rules built on existing Wi-Fi telemetry were shown to be deployable as a feasibility-level decision aid on a smart campus, while broader smart-city transferability should be validated through longer-horizon controlled evaluations.

Keywords:

smart campus; smart city; Wi-Fi telemetry; crowd management; lunchtime demand; decision support system; early-warning rule; occupancy sensing; internet of things (IoT)

Graphical Abstract

1. Introduction

Smart city initiatives leverage digital technologies and data analytics to enhance quality of life, improve service efficiency, and support sustainable behaviors [1,2]. In this context, smart campuses act as small-scale cities that use ICT, IoT, and Wi-Fi telemetry to monitor space utilization and inform operational decisions. Prior work has used Wi-Fi logs, sensor networks, and crowdsensing to visualize occupancy, predict pedestrian flows, and monitor crowds in buildings and urban districts [3,4]. Lunchtime demand in campus canteens was observed to arrive in short surges that increase crowding and queuing when capacity and flow are not actively managed [5]. Simulation evidence suggests that timely, low-cost reallocations (e.g., opening an extra counter, shifting staff) can trim peaks without new capital, motivating a maintainable Wi-Fi–based early-classification decision-support approach for same-day management [6,7].

Wi-Fi access-point (AP) association counts provide campus-wide coverage without added hardware and have been shown to correlate with building occupancy [8,9,10]. However, most prior work stops at sensing/estimation rather than supporting same-day managerial actions in canteens [8,9,10,11]. This gap motivated converting raw AP counts into a small atlas of daily demand typologies and issuing a transparent, prefix-based badge before noon, followed by a matched-pairs evaluation of peak impact under field conditions [12,13]. Although smart-campus dashboards and occupancy visualizations [14] are becoming increasingly common, very few implementations have successfully delivered a straightforward, low-maintenance daily rule that is directly linked to an actionable crowd-management playbook for canteens and comparable facilities. Because interventions must occur before the main surge, early time-series classification (ETSC) was used. Similarity-based, thresholded approaches have been reported as competitive and interpretable for decision support [15]. In this paper, daily profiles were row-z-standardized and clustered using silhouette-guided k-means [16], and a cosine comparison between the five-bin 11:00–11:28 prefix and centroid prefixes enabled a gated decision rule [17].

The artifact pursued three objectives: (O1) transform 11:00–14:00 AP counts into standardized daily curves; (O2) discover a small set of typologies via silhouette-guided k-means; and (O3) issue a late-morning (≈11:28) badge by comparing the partial curve with centroid prefixes using a pre-specified cosine similarity threshold of 0.90. These objectives supported RQ1: whether a lightweight, shape-based Wi-Fi pipeline can issue an early badge that is associated with lower peak occupancy relative to matched non-intervention days within the study horizon, the same canteen, and weekday. Two typologies were obtained (average silhouette ≈ 0.49): Early Peak and Late Shift. On held-out days, decision quality at 11:28 reached coverage 0.73 and agreement 0.86, and operational outcomes across 20 matched pairs indicated a Hodges–Lehmann median reduction of 0.193 z (one-sided 95% lower bound 0.068; Wilcoxon W = 169, p = 0.0077).

Because cluster structure was moderate and context-dependent, the derived Early Peak/Late Shift labels were interpreted as an operational partition for this campus and study window. The main value was provided by the methodology and deployment pattern rather than by the specific labels. Broader transferability was framed as transfer of the methodology rather than of fixed typology identities.

This paper contributes to the smart-city and smart-campus literature in four ways. First, it shows how existing Wi-Fi access-point logs can be repurposed as a privacy-preserving crowd sensor for day-to-day management of campus food facilities without new hardware. Second, it proposes a low-maintenance, prefix-based early-warning rule that uses cosine similarity to a frozen atlas of lunchtime occupancy typologies, avoiding the retraining burdens of complex forecasting models. Third, it provides a field feasibility study using matched canteen-weekday pairs and nonparametric effect sizes, estimating the association between badge-triggered information actions and peak crowding under routine operations. Fourth, it releases anonymized Wi-Fi data and complete analysis code to support replication and adaptation on other campuses and smart-city services. All analyses used aggregated AP counts without device identifiers, and results were reported at day-level granularity in line with privacy guidance [18]. The remainder of the paper describes the setting and data, details the method, and evaluates discovery quality, early-classification performance, and operational outcomes.

2. Related Work

2.1. Wi-Fi Access Point Telemetry as an Occupancy Proxy

Recent research has explored using existing Wi-Fi infrastructure to infer building occupancy in real-time. The ubiquity of Wi-Fi-enabled devices (smartphones, laptops, etc.) means that Wi-Fi access point (AP) telemetry can serve as a passive indoor people counter or occupancy sensor [19]. For example, it was shown that analyzing Wi-Fi probe request signals from smartphones allows near real-time estimation of crowd sizes, achieving occupancy count accuracy close to 95% in large venues [19]. Alishahi et al. (2022) demonstrated Wi-Fi counts outperforming CO₂ sensors for occupancy detection in library deployments [9]. This establishes AP-level aggregates as a practical sensing layer for service operations analytics.

Most modern campuses already operate enterprise Wi-Fi networks that log the number of client devices actively associated with each access point (AP) at regular intervals. Unlike passive probe-request sniffing, these data are generated natively by the infrastructure’s management information system (MIS) and therefore require no extra hardware, packet capture, or MAC-address de-randomization [9,13]. For campus-scale settings specifically, existing Wi-Fi infrastructure makes things convenient because the coverage already exists, signals are aggregated, and privacy risks can be addressed through count-only processing rather than device tracking [20]. In line with privacy guidance for network-derived occupancy, this research relies on aggregated AP counts (no device identifiers) and day-level reporting. Differentially private reporting schemes have also been demonstrated for campus Wi-Fi occupancy [18].

Beyond campus settings, Wi-Fi scanner data have also been used to analyze tourist mobility patterns at the scale of an entire region. In a large tourism area in Hokkaido, Wi-Fi-based digital footprints were converted into origin–destination “transactions” and analyzed with association rule mining and visualization to uncover multi-stop travel patterns and destination attractiveness for sustainable destination management [21]. However, these Wi-Fi-based crowd-sensing studies do not convert telemetry into same-day operational early warnings or evaluate the impact of an explicit decision rule on crowding, as performed in the present study.

The proposed approach was positioned within a broader class of non-intrusive occupancy inference methods that derive occupancy from indirect sensing signals rather than personal identifiers. For example, occupI infers occupancy states from aggregate energy-consumption patterns using a non-intrusive pipeline, highlighting the same privacy-preserving design goal of extracting actionable occupancy information without personal data [22].

2.2. Time-Series Clustering of Daily Load Shapes

Clustering is a standard way to discover recurring patterns (“typologies”) from demand curves, giving interpretable centroids that reveal some general behavior [23]. Shape-based algorithms, such as k-Shape, calculate phase-invariant similarities and then class centroids directly from normalized data. They have been shown to cluster day curves accurately and efficiently across domains [24]. Recent research and surveys suggest that shape-preserving distance/centroid choices improve interpretability for operations, which is crucial when typologies drive managerial decisions [25]. Within the residential-energy literature (energy domain), many works cluster daily load profiles (e.g., smart-meter curves) using k-means methods with normalization, selecting k by internal validity, and interpreting the prototypes as consumer or building usage classes [26]. While deep clustering is advancing, centroid-based motifs remain preferable in building operations settings because they are transparent and easy to map to playbooks [25,27]. Studies on energy time series also emphasized that method choice should balance accuracy and operational clarity, which matches a trade-off this paper considers when creating simple centroids for downstream decision rules [28].

2.3. Validating and Stabilizing Clusters

One of the most important questions in unsupervised learning is how many clusters to keep. Internal validity index in the form of average silhouette width (ASW) has been recommended by modern reviews for achieving a balance between cohesion and separation [29,30]. Apart from internal fit, stability (external) validation is also emphasized. Resampling-based checks (split-half, subsampling, or consensus) are performed to test whether typologies reappear on new data [31]. For data that changes over time, such as in real-world applications, specific metrics like the temporal silhouette are used to handle so-called concept drift. This is a case where the underlying data distribution changes, causing a model’s performance to decline over time. Therefore, it is crucial to report not just how well a model fits the initial data, but also its stability or how well it maintains its performance as the data changes [32].

Preprocessing affects cluster structure, so light smoothing and outlier limiting help suppress transient spikes, while z-normalization emphasizes shape over magnitude, which is a common choice for daily profile clustering and for downstream shape matching [33]. Documenting these steps and fixing them fundamentally improves replicability [34]. This supports a pre-specified pipeline (light smoothing, winsorization, row-wise z-scores) before clustering in this research.

2.4. Early Time-Series Classification (ETSC) and Prefix Decisions

Operational situations often require classifying a day early from a partial prefix. ETSC proposes decision rules that trigger a label once confidence exceeds a threshold by recognizing the accuracy-earliness tradeoff [35]. For example, the TEASER algorithm in particular provides a principled baseline that issues early decisions only when confident, similar to the “badge when cosine ≥ threshold” rule used in this research [35].

Recent methods like CALIMERA demonstrate that reliable early decisions are feasible across benchmarks when calibration is handled carefully [36]. Empirical ETSC frameworks now include evaluation protocols that report earliness and accuracy, which were represented via coverage (share of days confidently labeled at 11:28) and agreement (match to end-of-day typology) in this research [15]. Reviewed literature supports the idea that frozen centroid prefixes and cosine similarity can provide fast and early labels that are sufficient for operational playbooks, with lower computational cost than training a full predictive model online each day.

2.5. Clustering-Then-Early Matching vs. Forecasting

Models used to predict future trends from time-series data often become less accurate when the underlying patterns change. To maintain accuracy, a lot of research has focused on developing methods for constantly updating these models and adapting them to these changes (concept drifting problem) [37]. Studies explicitly pose online ensembling or normalization as responses to concept drift and evolving dynamics, admitting the maintenance burden present in daily deployment [38,39]. Accurate forecasting with data that changes over time is a significant challenge. To handle this, models need to be continually retrained or adapted, and teams must regularly select the best model and manage its settings. For small teams with limited resources, these ongoing tasks can be very demanding and difficult to maintain [37]. In contrast, clustering-then-early-matching bypasses full-trajectory prediction. The day’s shape is identified early by comparing its prefix to a small atlas of frozen centroids, with occasional offline refreshes instead of continuous model updates [23]. For operational decision-making (e.g., queue management), interpretable typology labels and simple prefix matching can give timely actions with lower maintenance and at the same time still recognize drift via periodic stability checks and atlas refresh triggers [31,32].

3. Materials and Methods

3.1. Methods Overview

To aid the understanding of the multi-stage pipeline, a schematic overview is provided in Figure 1. The diagram summarizes the methodological flow from raw Wi-Fi counts to preprocessing, temporal holdout, clustering, early same-day classification, evaluation, and deployment. This overview highlights how raw telemetry is transformed into interpretable typologies and how those typologies are used for real-time operational labeling. The pipeline is split into three parts, with the offline atlas representing the construction of daily demand centroids. The online part represents the real-time classification of partial-day signals via cosine matching. The evaluation part measures pipeline performance using preset metrics. As sketched in Figure 1, the offline stage allowed for up to three lunchtime typologies, but the combination of silhouette width and prediction strength selected k = 2 for the final atlas (Early Peak and Late Shift; see Section 4).

This is a small-sample field feasibility/operational demonstration designed to test whether a frozen centroid + early-matching policy can be executed prospectively and yield directionally beneficial peak reductions under routine conditions. The sample size reflects resource and access constraints typical of service operations trials. A precision/estimation focus (one-sided HL lower bounds) was adopted rather than a high-powered effectiveness test. The artifact was intentionally designed to fit into routine smart-campus management workflows, requiring no per-day model fitting and exposing a single interpretable badge rather than a complex forecast. Methodologically, the contribution is not a new sensing modality or clustering algorithm, but an operationally complete design pattern: (1) a frozen, two-centroid typology atlas learned offline from Wi-Fi counts; (2) a reject-option early classification rule that triggers only under high similarity at a fixed decision time; and (3) a paired, nonparametric evaluation that reports a policy-relevant minimum gain via one-sided Hodges–Lehmann lower bounds. This combination is designed for low-maintenance deployment and auditable decision support using existing campus infrastructure.

Operational deployment steps and interface details are provided in the repository cited in the Data Availability Statement.

3.2. Setting and Data

The campus is part of a broader smart-city initiative and is equipped with pervasive Wi-Fi coverage, which is already used for IT operations and can be reused as an anonymized crowd-sensing layer [40]. Service operations were observed in five campus canteens in Mae Fah Luang University (MFU), Thailand. Wi-Fi association counts from nearby access points were aggregated from 11:00 to 14:00 across three consecutive weekday weeks (4–26 September 2025). Weeks 1–2 were used for model selection, and Week 3 was held out for evaluation. The university operates an enterprise-grade Wi-Fi network across all teaching buildings and canteens. For this study, aggregate association counts published by the IT office via a dashboard and JSON API were used. Each record reports, for every access point covering a canteen, the number of active client associations at the end of each 7-min interval. The seven-minute step matches the update frequency of the underlying infrastructure. Device identifiers are anonymized by the university before aggregation. Raw MAC addresses and radio-level diagnostics are not available to the authors. Therefore, the Wi-Fi infrastructure is treated as an existing crowd-sensing layer, and the focus is on the analytics pipeline that transforms these aggregate counts into typologies and Early Peak badges.

Only aggregate counts were processed, and device identifiers were neither accessed nor stored for occupancy analysis on smart campuses [4]. Wi-Fi associations were adopted as a pragmatic, privacy-preserving occupancy proxy, consistent with prior validation against camera ground truth [8,9,41]. Days were excluded if more than 2 of the 26 bins were missing or if network outages were logged. Canteen capacities and AP coverage are given in Table 1. The data was collected during weekdays of the first semester. Exam weeks and atypical event days were excluded so that the atlas captures routine lunchtime operations rather than exceptional surges.

AP association logs count devices, not people. The device-person ratio can drift with multi-device users, handset sleeping, and stationary devices, and coverage varies with AP placement. Prior facilities studies report strong correlations between AP connections and ground-truth occupancy, but they also recommend simple stabilizers (e.g., removing stationary devices, restricting to well-covered APs) and clear reporting of aggregation levels [9]. In this study, light outlier damping and smoothing were applied when working with aggregate AP counts, and emphasis was put on shape (per-day z-scores) over magnitude for the early decision. Privacy was preserved by avoiding device identifiers and publishing only day-level results. De-identified Wi-Fi aggregate counts, analysis code, and entire pipeline code have been deposited to a repository referenced in the Data Availability Statement.

3.3. Preprocessing and Representation

To reduce the influence of rare spikes while preserving temporal shape, counts were winsorized at a high percentile and lightly smoothed with a short symmetric window. When binning to fixed 7 min intervals, canteen locations and days were indexed as l and d, respectively; t index bin starts within [11:00, 14:00) at a fixed step Δ = 7, giving T = 26 bins. For an event at minute-of-day m, the bin start was computed as

b (m) = 11 : 00 + ⌊\frac{| m - 11 : 00 |}{Δ}⌋ Δ

(1)

Per canteen-day, counts were aggregated (summed) into the bin starts {t = 1,…,T}. Missing bins after pivoting were filled with 0, producing a length-T vector of raw counts:

x_{l, d} = {(x_{l, d, 1}, \dots, x_{l, d, T})}^{⊤}

(2)

To reduce the undue influence of extreme spikes while preserving ranking, a per-curve upper cap was applied. For each

x_{l, d}

, the q-quantile (default q = 0.995) was computed, and an upper-tail winsorized series was obtained:

{\tilde{x}}_{l, d, t} = \min (x_{l, d, t}, Q_{q} (x_{l, d})), t = 1, \dots, T

(3)

Upper-only winsorization removes rare, harmful spikes while preserving the valley structure that retains lunch-rush timing. It aligns with the operational objective (reduce peaks) and avoids masking genuine quiet periods or data-quality issues. Local irregularities were reduced by using two-stage smoothing. In the first stage, the centered median value m at time t is found. It is the median of all the original

\tilde{x}

values from h steps in the past (t − h) to h steps in the future (t + h):

m_{l, d, t} = median ({{\tilde{x}}_{l, d, t + j}}_{j = - h}^{h})

(4)

The second stage filter takes the output from the first stage (4) and applies a standard moving average to it according to window width w bins. Let w be an odd window length and h = (w − 1)/2. The smoothed value at time t was computed as

{\hat{x}}_{l, d, t} = \frac{1}{w} \sum_{j = - (w - 1) / 2}^{(w - 1) / 2} m_{l, d, t + j}

(5)

Only positions with a full window were smoothed. The first and last h-bins remained unsmoothed. In this study, w = 3 (h = 1). Each canteen day was then standardized row-wise (zero mean, unit variance) so that later analysis emphasized shape rather than absolute magnitude. For each

{\hat{x}}_{l, d}

, the mean and standard deviation were computed across the T bins,

μ_{l, d} = \frac{1}{T} \sum_{t = 1}^{T} ({\hat{x}}_{l, d, t}), σ_{l, d} = \sqrt{\frac{1}{T} \sum_{t = 1}^{T} {({\hat{x}}_{l, d, t} - μ_{l, d})}^{2}}

(6)

and z-scores were obtained with a small numerical guard ε (e.g., 10⁻⁹):

z_{l, d, t} = \frac{{\hat{x}}_{l, d, t} - μ_{l, d}}{σ_{l, d} + ε}, t = 1, \dots, T

(7)

The resulting 26-dimensional vectors (one per day) formed the input to clustering and early classification. All pre-specified analysis parameters and their operational rationale are listed in Appendix A, Table A1. All preprocessing steps (binning, winsorisation, smoothing, and row-wise z-scoring) are implemented in the open analysis code repository referenced in the Data Availability Statement, which mirrors the sequence described here.

3.4. Typology Discovery and Prefix-Based Matching

Recurring lunch-demand shapes were identified with an unsupervised procedure so that typologies could be discovered without labels. Each canteen-day was represented as a 26-bin vector (11:00–14:00, 7 min bins) and standardized row-wise to zero mean and unit variance so that distances emphasized shape rather than level [16]. The standardized vectors were then partitioned by k-means with k-means++ seeding over k ∈ {2,3,4}, a simple, fast choice that reduces sensitivity to poor initializations [42,43]. The Scikit-learn (v1.1.3) library was used to perform k-means++ seeding with the parameters of n_init = 25, max_iter = 300 (default), and random_state = 42 (random seed).

The number of clusters k was selected by average silhouette width (ASW), computed on the standardized vectors. The silhouette index provides an internal validity measure that balances within-cluster cohesion and between-cluster separation. Higher values indicate cleaner, more interpretable partitions. When ASW values were similar across neighboring k (ties < 0.02), a parsimony rule (choosing the smaller k) was applied to avoid over-segmenting the data, which aligns with best practice in clustering validity [44]. A minimum cluster size of ≥ 10% and a mean silhouette ≥ 0.25 are required as pragmatic thresholds for a small-sample field feasibility demonstration (thresholds summarized in Appendix A, Table A2). ASW interpretation (≥0.25 = weak but non-spurious structure) follows standard guidance from Rousseeuw/Kaufman and summaries widely used in practice [45].

Cluster stability was assessed with prediction strength (PS), an out-of-sample reproducibility index [46]. For each split, the data were halved, then each half was clustered by k-means (25 restarts), and points in one half were assigned to the other half’s centroids. For every predicted cluster, the co-membership rate (share of point pairs that also co-occurred in the other half’s clustering) was computed. PS(k) was the minimum of these rates across clusters, combined over directions, and averaged across many splits. The usual rule selects the largest k with PS(k) ≥ 0.8–0.9. Given the short horizon and modest sample, a feasibility screen of PS ≥ 0.75 was adopted. The chosen solution (k = 2, PS = 0.785) passes this screen and also satisfies parsimony and silhouette considerations. Higher PS indicates a more reproducible structure. For deployment in other sites, or in terms of the atlas (centroids), k, and the similarity gate τ were expected to be re-estimated using local historical data and a forward hold-out check.

Typologies were obtained by taking each cluster centroid and labeling it by its peak time (e.g., Early, Noon, Late). Centroids served as prototype shapes commonly used in operational analytics, aiding communication and mapping to simple playbooks [47]. The resulting atlas was frozen for deployment, enabling stable same-day decisions and lowering maintenance relative to continuously retrained forecasting pipelines. Periodic offline refreshes (monthly or after semester changes) accommodate seasonality while preserving day-to-day consistency [26,48]. This discovery-then-freeze workflow aligns with standard guidance in time-series clustering surveys and clustering-validation literature, which emphasize (a) normalization for shape, (b) robust seeding/initialization, (c) explicit model selection with internal indices, and (d) clear reporting of cluster quality [16,42,44]. The exact k-means++ initialization, clustering calls, and centroid-freezing steps are available in the same repository, including configuration files that reproduce all results.

Once clusters are frozen, they can be used for same-day classification operations by comparing the standardized five-bin prefix (11:00–11:28) for each canteen against the corresponding five-bin prefixes of the frozen centroids using cosine similarity. For a given canteen day, let

z_{d, 1 : 5}

denote the prefix and

c_{k, 1 : 5}

the centroid prefix for typology k. For all k,

s_{k} = c o s (z_{d, 1 : 5}, c_{k, 1 : 5})

was computed, and the day was assigned to the typology with the largest similarity when

m a x_{k} s_{k} \geq τ = 0.90

. In the operational evaluation, an Early Peak badge is issued only when the provisional typology label is Early Peak, and the corresponding similarity s_{Early Peak} meets Early Peaks _{Early Peak} ≥ τ. If the best-matching centroid corresponds to any other typology, or if the Early Peak label has cosine similarity s_{Early Peak} < τ = 0.90, no Early Peak badge is issued, and such days are treated as “no-action” cases in the operational evaluation.

In summary, the artifact is a two-cluster atlas (Early Peak and Late Shift) combined with a one-sided Early-only gate: only high-confidence Early Peak days trigger an Early Peak badge and actions. Late Shift and low-confidence days serve as “no-action” baselines for evaluation.

3.5. Temporal Hold-Out Protocol

A temporal hold-out protocol evaluates forward-in-time generalization: the atlas and threshold are fixed on earlier days and assessed on later “testing” days to avoid temporal leakage, following time-series evaluation practice. Together, PS was used to support the selected structure (k, centroids), while the temporal hold-out split was used to assess the forward-in-time behavior of the early rule. During the temporal hold-out protocol, frozen clusters trained on Weeks 1–2 (10 weekdays) were applied prospectively to the subsequent 6 weekdays (held-out) with a fixed cosine threshold (τ = 0.90), and early-classification performance was summarized at ~11:28 on the held-out days. Two primary metrics were computed: (i) coverage, the fraction of canteen days for which the 11:28 similarity exceeded the fixed gate τ = 0.90, and (ii) agreement, the fraction of covered days where the 11:28 Early Peak badge matched the end-of-day typology label. For context, precision (PPV) and recall (sensitivity) for the Early Peak class were also reported in Section 4.

3.6. Statistical Analysis

The operational policy was pre-specified to act on days whose 11:28 prefix was classified as Early Peak with confidence (cosine similarity ≥ τ, at τ = 0.90). The operational impact of Early Peak badge guidance was evaluated using a matched-pairs design. Within each canteen–weekday combination, intervention days (i.e., days on which the standardized information action was delivered) were matched to the nearest prior non-intervention (no-action) day, yielding 20 canteen–weekday pairs, under a partial-interference assumption that actions at a given canteen do not materially affect other canteens on the same day. Cross-canteen spillovers are not estimated in the primary analysis.

The primary outcome, PeakProxy, was defined as the maximum standardized crowd density z-score in the 11:00–14:00 window. If the intervention shifts demand away from the peak or flattens it, the PeakProxy difference should move in the expected direction [6]. Let d₁,..., d_n denote paired differences (non-intervention–intervention). Paired differences were analyzed with a one-sided Wilcoxon signed-rank test (alternative: reduction > 0). The paired effect was summarized by the Hodges-Lehmann (HL) estimator, defined as the median of the Walsh averages of the paired differences. This estimator is the standard location-shift estimate associated with the signed-rank procedure [49]. The Hodges-Lehmann estimator was computed as the median of the n(n + 1)/2 Walsh averages,

(d_{i} + d_{j}) / 2

for 1 ≤ i ≤ j ≤ n. For interval estimation, a one-sided 95% lower bound for HL was obtained using the bootstrap percentile method [50]. For the bootstrap percentile CI, B = 2000 resamples were drawn with replacement from the paired differences, and the HL estimator was recomputed on each resample. The generator was initialized with seed = 42 (numpy.random.default_rng(42), NumPy v1.24 so identical resamples are obtained across runs. Using 2000–5000 replications is common in practice for percentile CIs, and 2000 was adopted here as a balance of stability and runtime [51]. Annotated analysis notebooks in the repository provide the full implementation of the matched-pair construction, Wilcoxon and Hodges-Lehmann calculations, and bootstrap procedures, enabling exact reproduction of all reported estimates.

Robustness of the matched-pairs result was assessed in two ways. First, repeated non-intervention baseline days within each canteen–weekday stratum were restricted to be used at most once (“unique-baseline” restriction). Second, leave-one-canteen-out re-estimation was conducted to diagnose whether the paired association was disproportionately driven by a single canteen. The analysis code has been included in the repository mentioned in the Data Availability Statement.

The matched-pairs design estimates the median difference in peak crowding between intervention days and the nearest prior non-intervention day within the same canteen and weekday, assuming peaks would have followed similar short-term trends without the information intervention. Because downstream actions were not standardized or fully observed, the causal effects of specific actions were not identified, and only an association with the standardized information intervention was estimated. Spillovers across canteens and longer-run adaptation were not identified and were treated as limitations.

4. Results

4.1. Dataset and Evaluation Split

Across 5 canteens, 10 weekdays were used to learn the centroid atlas and select k. Chronologically, 6 later weekdays were held out for evaluation using a forward-in-time split (rolling/blocked origin). The atlas (k, centroids) was fixed on the training window and applied unchanged to held-out days. No tuning against the hold-out was performed. The τ = 0.90 was specified a priori from operational considerations.

4.2. Clustering and Internal Fit

Daily 11:00–14:00 curves were clustered into k = 2 shapes. The Early Peak and Late Shift clusters contained 37 and 13 canteen days, respectively. Overall separation was moderate (silhouette ≈ 0.49). To quantify uncertainty in this small sample, canteen days were bootstrap-resampled (B = 2000), and ASW was recomputed, producing a 95% CI of approximately [0.427, 0.587] (median 0.509). A two-dimensional t-SNE embedding of the 26-bin standardized daily curves was generated for visualization in Appendix C, Figure A1. A compact cluster and a more dispersed cluster were observed, with partial separation in the embedding, which was consistent with ASW values near 0.5. Because t-SNE is sensitive to hyperparameters and is not a validity test, it was used only for qualitative inspection and was not used for model selection or inference.

Across bootstrap splits, prediction strength (PS) for k = 2 averaged 0.785 (SD = 0.15). The alternative k values had a lower PS (Table 2).

Human-readable labels were assigned by centroid peak times (Early Peak: 11:35; Late Shift: 12:31), and atlas plots are shown in Figure 2. Together, ASW ≈ 0.49 and PS ≈ 0.785 indicate moderate separation and moderate resampling stability for k = 2. Therefore, a two-shape atlas was treated as sufficient for a feasibility demonstration rather than as evidence of strongly separated latent classes. Given the moderate ASW, the two clusters were interpreted as a practical partition that supports an early warning rule in this setting, rather than as definitive evidence of strongly separated latent classes.

4.3. Temporal Hold-Out Stability

Clusters trained on Weeks 1–2 (10 weekdays) were frozen and then applied prospectively to Week 3 (held-out) days with a fixed cosine threshold (τ = 0.90). At 11:28 (badge time), 22/30 held-out canteen days (0.73) were confidently classified using the prefix-only rule (cosine ≥ 0.9). Among confidently classified canteen days, 19 matched the end-of-day cluster label (agreement = 0.86). Early Peak’s precision was 0.90, and its recall was 0.81. The confusion matrix is provided in Table 3. High precision on the Early Peak at 11:28 favors safe triggering of actions.

Sensitivity checks at τ ∈ {0.85, 0.90, 0.95} showed the expected trade-off between coverage and precision in Table 4.

Although τ = 0.85 and τ = 0.90 yielded the same observed precision on this fold, τ = 0.90 was pre-specified to satisfy a precision floor without tuning on the hold-out, and it provides a more conservative risk-coverage operating point under the reject-option model where false-alert costs exceed the value of a few additional decisions. The threshold τ was interpreted as the operating point of a reject-option classifier: a badge was issued only when prefix similarity exceeded τ. Otherwise, the day was left unflagged (“reject”), and routine monitoring continued. This conservative choice prioritizes avoiding false positives (unnecessary alerts and loss of managerial trust) over capturing every early peak, consistent with decision-theoretic treatments of classification with rejection under asymmetric costs [52].

4.4. Eligibility and Pairing

For outcome evaluation, intervention days were restricted to those whose 11:28 prefix was confidently classified as Early Peak (cosine ≥ 0.9). Within this eligible set, matched pairs were constructed on canteen-weekday pairs, selecting the closest non-intervention per intervention day. This resulted in 20 pairs spanning five canteens (see Appendix B, Table A3).

4.5. Primary Outcome (PeakProxy)

PeakProxy was computed per day using the same preprocessing as in training (winsor = 0.995, smoothing window = 3). Across 20 pairs, mean PeakProxy fell from 1.487 (non-intervention days) to 1.319 (intervention days), a mean absolute reduction of 0.168 (≈9.39%). The Hodges-Lehmann median paired reduction was 0.193 z-units with a one-sided 95% lower bound of 0.068, and the Wilcoxon signed-rank test was W = 169, p = 0.00765 (one-sided). Effects were interpreted using the HL estimate with a one-sided 95% bootstrap percentile lower bound, consistent with the one-sided Wilcoxon formulation (see Methodology). Matched peak comparison is given in Figure 3. Interpreting the one-sided HL lower bound (≥0.068 z-units) was performed as a policy-relevant minimum gain at this sample size.

Sensitivity analysis of the matched pairs resulted in the following: under the primary matching rule, 17/20 pairs were separated by 7 days and 3/20 by 14 days. Under the unique-baseline restriction (n = 17), the Hodges–Lehmann estimate remained positive (HL = 0.1671 z-units; one-sided Wilcoxon p = 0.0284), with 13/17 pairs favoring lower peaks on intervention days. Leave-one-canteen-out estimates remained positive across all folds (HL range 0.1195–0.2595), while precision varied (e.g., drop E1: p = 0.0765), reflecting small-sample heterogeneity rather than a single dominant canteen. Complete sensitivity analysis results are given in Table A4 (Appendix B).

5. Discussion

5.1. Managerial Implications for Service Operations

Findings complement Wi-Fi–based occupancy research that links AP counts with occupancy and service-operations signals. Prior studies show Wi-Fi telemetry can proxy occupancy patterns and support operations decisions in offices and campuses [9,10,12]. Here those works were extended by operationalizing a same-day decision rule and evaluating peak-outcome differences under a short-horizon matched-pairs feasibility design in a real canteen setting. Consistent with those reports, leveraging existing Wi-Fi infrastructure offers a low-cost, privacy-preserving signal for operations, while one-sided HL framing emphasizes minimum gains rather than large-sample effect precision. A one-sided formulation was prespecified because the policy objective concerns reductions in peak load. HL with a one-sided lower bound therefore reports the minimum policy-relevant improvement, consistent with the signed-rank test direction [53].

Results suggest that a lightweight, no-new-hardware pipeline can convert facility Wi-Fi telemetry into same-day operational signals (demand typologies and a pre-noon badge). A modest noon-peak trimming of the magnitude observed is consistent with shorter waiting times and higher throughput in cafeteria discrete-event simulation studies, where targeted capacity reallocations yield measurable queue-time improvements [5,6]. At 0.73 coverage and 0.9 precision on held-out days, the badge was used as a high-confidence trigger for a standardized information action (dashboard display plus a pre-written LMS announcement encouraging staggering/deflection), rather than for mandated staffing or capacity changes. These actions are shown in service-operations literature to reduce peak delays without new capital [54]. Evidence from building research also supports Wi-Fi AP connections as a practical occupancy proxy, reinforcing the choice [9]. These results indicate a positive median reduction with a one-sided 95% lower bound above zero, consistent with a policy that is expected only to reduce peak load.

5.2. Interpretability and Maintainability

Freezing a compact atlas of centroids and using cosine on fixed bins keeps decision rules transparent and low-maintenance, unlike forecasting pipelines that usually require retraining and drift monitoring. Prediction Strength (for reproducibility) and Silhouette (for separation) were used to support the selection of k before deployment. Temporal hold-out demonstrates forward-in-time behavior of the early rule [46]. The interpretability of early badges was favored by the shape-first representation and a frozen atlas. The maintenance burden was reduced through a train-once, refresh approach periodically. These choices were made to support repeatable, transparent decisions. Transferability was framed as the ability to apply the same interpretability-first procedure under local calibration, rather than as a claim that the same typology identities would recur unchanged across sites.

5.3. Scope, Limitations, and External Validity

This study targets lunchtime canteen operations where class timetable “waves” make pre-noon signals operationally valuable. Twenty intervention–non-intervention matched pairs were analyzed, which supports paired nonparametric inference but limits precision. Therefore, estimation was emphasized using one-sided Hodges–Lehmann bounds and feasibility interpretation rather than definitive effectiveness claims. Most intervention days exhibited lower peaks, but not all; the Hodges–Lehmann estimator summarizes the median paired effect under small-sample asymmetry. Findings are limited to academic-schedule service points with sufficient Wi-Fi access points (AP) coverage, where AP association counts have been shown to track facility-level occupancy in similar settings [9,13,19].

Several limitations apply. The horizon is short (weeks), so uncertainty is wide, and time-varying confounding from weather, events, and schedule changes cannot be ruled out. Causal attribution was not attempted, and outcomes were interpreted as feasibility associations using exact nonparametric tests and Hodges–Lehmann differences. The action window coincides with a specific term period, so calendar effects (e.g., assessments/exams) may shift behavior, and results should be read as period-specific rather than season-wide impacts [55]. Wi-Fi telemetry counts devices rather than people, and AP placement/controller behavior may bias magnitudes; therefore, emphasis was placed on z-standardized shapes and prefix cosines, and counts were treated as relatively crowd-pressure signals rather than headcounts [9]. Spillovers between nearby canteens may occur, so effects were interpreted within canteens (partial interference), limiting network-wide claims [56].

Finally, the cluster atlas (k = 2; silhouette ≈ 0.49) and the Early Peak/Late Shift labels are contingent on the setting, horizon, and timetable structure; other sites may yield different atlases. External validity is expected to improve when comparable lunch windows and surges exist and when the atlas refresh is triggered periodically to handle drift. Overall, findings should be read as a feasibility demonstration motivating a larger confirmatory study with pre-registered metrics and expanded sampling [57]. Porting is expected to require re-estimation of k, retraining the atlas, and periodic review of τ.

5.4. Operationalization and Policy Integration

The intervention in this study was restricted to a standardized information action: (i) displaying an Early Peak badge on the dashboard and (ii) posting a short, pre-written LMS announcement at 11:29, encouraging staggering/deflection (e.g., delaying departure or using alternative nearby canteens) (Table A5 and Table A6, Appendix D). No standardized staffing, counter-opening, pricing, or incentive protocol was mandated, so the outcome analysis is framed as an association under routine operations rather than a controlled causal estimate. No incentives or penalties were used. A frequency cap of one trigger per canteen-day was enforced, and abstention occurred automatically when the similarity fell below the threshold. Consistency was supported by fixed trigger timing, a single message template, and logging of posting status. Intervention components, timing, and compliance observability are presented in Table A7 (Appendix D). Detailed interface text, delivery settings, and hardware and software specifications are provided in the repository given in the Data Availability Statement.

5.5. Smart-City Implications

The case study suggests that smart-campus Wi-Fi telemetry can be converted into a daily, actionable early-warning signal for crowd management without deploying additional sensors or maintaining complex forecast models. Although the present work focuses on lunchtime queuing in university canteens, the same design pattern, like clustering daily occupancy curves into typologies, freezing centroids, and issuing prefix-based badges, could be adapted to other smart-city venues that experience recurrent demand surges, such as transit interchanges, event halls, or tourist attractions. The university campus functions as a low-risk living lab where lightweight Wi-Fi-based early warning rules can be prototyped, audited, and revised before considering deployment in higher-stakes municipal contexts. In such settings, the artifact contributes to the smart-living, smart-mobility, and smart-governance dimensions of smart cities [1,2] by supporting peak-smoothing decisions under routine operations, while requiring longer-horizon controlled evaluation for causal attribution. The approach could be applied to other campus facilities, malls, transit hubs, etc., wherever there is high-frequency occupancy telemetry. Furthermore, the prefix-matching rule is transparent enough to be communicated to non-technical managers, which is critical for aligning IoT analytics with governance and operational decision-making in smart-city programs [14].

6. Conclusions

A short-horizon field feasibility study was conducted in which smart-campus Wi-Fi infrastructure was repurposed as a privacy-preserving sensing layer for everyday crowd management. An atlas of recurrent lunchtime demand typologies was derived from 7-min access-point counts in five university canteens, and a simple prefix-based early-warning rule was designed to classify each day’s 11:28 prefix into Early Peak or Late Shift, and to issue an Early Peak badge when similarity to the Early Peak centroid exceeds a preset threshold. By freezing the cosine-similarity atlas and exposing only a binary badge, the system translates complex time-series patterns into a single, interpretable signal that can be integrated into daily operational routines.

In matched canteen-weekday pairs, the Early Peak badge guidance was associated with a Hodges-Lehmann median reduction of 0.193 standard-deviation units in peak crowding, with a one-sided 95% lower bound of 0.068. These findings were consistent with feasibility-level peak trimming under routine operations. However, because response actions were not standardized and time-varying confounding could not be excluded, the outcome results were interpreted as associations rather than definitive effectiveness estimates. Given the non-randomized, short-horizon design and the absence of a mandated staffing protocol, the findings support feasibility and directional promise rather than definitive causal impact.

From a Smart Cities perspective, the contribution is twofold. First, the approach shows how IoT-related telemetry that is already collected for network operations can be converted into a lightweight decision-support artifact, without deploying new sensors or maintaining complex forecasting models. Second, the artifact supports smart-governance objectives by remaining transparent to non-technical managers: the Early Peak/Late Shift badge, the underlying typologies, and the associated actions can be explained and audited, which is essential for acceptance in real operational settings.

Although the case study focused on university canteens, the same design pattern of clustering daily occupancy curves, freezing centroids, and issuing prefix-based badges could be adapted to other smart-city venues that face recurrent demand surges, such as transit interchanges, event halls, or tourist attractions. Future work may extend the framework to multi-venue coordination, integrate additional data sources (such as weather or timetable changes), and explore how early-warning badges interact with other smart-city services and governance mechanisms. Taken together, the results indicate that simple, interpretable rules built on existing Wi-Fi telemetry can be prototyped as a low-maintenance decision aid in smart-campus living-lab settings, while broader smart-city transfer should be validated through longer-horizon, controlled designs and standardized response protocols.

Author Contributions

A.V.: Conceptualization; Methodology; Software; Formal analysis; Visualization; Writing—original draft. T.A.: Conceptualization; Supervision; Project administration; Methodology; Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study processed aggregate Wi-Fi association counts supplied by the university IT team. No device identifiers or personal data (e.g., MAC addresses) were stored or analyzed.

Data Availability Statement

Anonymized, aggregate Wi-Fi access-point counts and all analysis code are openly available at: https://osf.io/wcean/overview?view_only=9c94a08980304845b4ea4c13c2805f43 (accessed on 24 December 2025). The repository contains the full Python (v3.10) notebooks and configuration files used in this study, including all preprocessing, clustering, early-matching, and statistical analysis routines. These materials reproduce every figure and table in the paper and document all computational details that are not exhaustively described in the main text.

Acknowledgments

This study was supported by Mae Fah Luang University, Thailand.

Conflicts of Interest

The authors declare no competing financial or personal interests that could have influenced the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

AP	Access Points
ETSC	Early time-series classification
HL	Hodges-Lehmann
CI	Confidence Interval
LMS	Learning Management System

Appendix A

Table A1. Core pre-specified parameters.

Parameter	Value	Choice Rationale	Operational Implication
Time window	11:00–14:00	Bounds the lunch phenomenon; trims irrelevant tails before/after the service peak.	All actions/metrics reference this 3 h window.
Bin size	7 min (26 bins/day)	Enough resolution to capture curve shape without making the early badge jittery. Also, it is a rate at which campus Wi-Fi publishes updates	Manager actions align to ~15 min blocks; 2 bins ≈ 14 min.
QA Winsorization	High-quantile cap (0.995) on single-bin spikes	Removes implausible probe storms/AP handover bursts without distorting day-shape.	Stabilizes measurements; prevents one-offs from driving labels.
QA Smoothing	Centered 3-bin moving average	Removes one-bin jitter while preserving peak timing.	Keeps peak time intact for action timing.
Standardization	Row-wise z-score per day	Emphasizes shape/timing over absolute volume; allows fair cross-day/canteen comparison.	Comparable typology across locations; stable early similarity.
Clustering method	k-means++ (Euclidean) on z-curves	Simple, fast, explainable; centroids are interpretable for operations.	Produces human-readable centroids for playbooks.
k search range	{2,3,4}	Lunch window admits few meaningful shapes; larger k risks micro-clusters.	Limits taxonomy to actionable labels (Early/Noon/Late style).
Model selection	Average silhouette (ASW, mean over all days)	Standard separation/compactness measure for z-curves. (ASW ≥ 0.25 = weak but non-spurious.)	Objective choice of k before labeling/operations.
Parsimony tie-break	If ΔASW < 0.02, choose the smaller k	Two-hundredths gain is negligible operationally; avoids label sprawl.	Prevents over-segmentation for trivial fit gains.
Badge decision time	11:28 (pre-specified)	By ~11:28 the emerging shape is visible from early bins.	Managers can act before noon with confidence.
Badge threshold τ	0.90 (cosine similarity on 5-bin prefix)	Demands strong shape alignment (~≤26°); favors accuracy over volume.	Fewer whipsaws; “no-call” allowed when shape is ambiguous.
Primary outcome	PeakProxy = max z (11:00–14:00)	Captures peak strain relevant to crowd and prep pressure.	Directly tied to spike-flattening objective.
Evaluation design	Matched pairs (same canteen and weekday; nearest prior non-intervention)	Removes fixed effects and weekly cycles.	Action vs. non-intervention comparisons.

Table A2. Acceptance gates (all must pass).

Gate	Threshold/Rule	Justification	If Fails
G1. Mean silhouette (ASW)	ASW ≥ 0.25 on full-day z-curves	Ensures at least weak but real structure; <0.25 suggests mushy/overlapping clusters.	Reject solution; reduce model complexity (smaller k) and re-fit.
G2. Parsimony tie-break	If Δ ASW < 0.02 between adjacent k → choose smaller k	Negligible gain for operations; avoids micro-clusters and label sprawl.	Pick lower-k solution; re-name centroids accordingly.
G3. Minimum cluster size	No cluster < 10% of days (training)	Tiny clusters are hard to interpret or act on; often noise.	Reject solution or reduce k; merge rare class to nearest centroid.
G4. Temporal separation of centroids	Centroid peak times strictly ordered and ≥2 bins apart (≥14 min)	Matches ~15 min staffing granularity; keeps labels distinct and actionable.	Merge offending centroids or reduce k until peaks are distinct.
G5. Early badge agreement @ 11:28	With τ = 0.90, agreement ≥ 0.80 (badge vs. end-of-day) on held-out days	Prefers precision over volume so actions are not whipsawed.	Raise τ (more conservative) or allow no-call more often; if still <0.80, revisit k or QA.

Appendix B

Table A3. Pairing of non-intervention and intervention canteen days (n = 20).

Canteen	Weekday	Non-Interv_Date	Interv_Date	Non-Interv_Peak	Interv_Peak
C5	Friday	2025-09-12	2025-09-19	1.457045552	0.954388102
C5	Friday	2025-09-12	2025-09-26	1.457045552	1.188123592
C5	Tuesday	2025-09-16	2025-09-23	1.866290196	1.304073335
D1	Thursday	2025-09-11	2025-09-18	1.55701665	1.048540481
D1	Friday	2025-09-12	2025-09-19	1.079677234	1.186557105
D1	Monday	2025-09-15	2025-09-22	1.494757514	1.375221542
D1	Tuesday	2025-09-16	2025-09-23	1.384713842	1.361185674
E1	Thursday	2025-09-11	2025-09-18	1.740706542	1.443399294
E1	Friday	2025-09-12	2025-09-19	1.728115121	1.477944847
E1	Friday	2025-09-12	2025-09-26	1.728115121	1.241416601
E1	Monday	2025-09-15	2025-09-22	1.615932277	1.390465361
E1	Tuesday	2025-09-16	2025-09-23	1.414927516	1.055782321
E1	Wednesday	2025-09-17	2025-09-24	1.340376304	1.303394416
E2	Thursday	2025-09-11	2025-09-18	1.188954414	1.855193867
E2	Friday	2025-09-12	2025-09-19	1.248842941	1.151039237
E2	Friday	2025-09-12	2025-09-26	1.248842941	1.098024771
E2	Monday	2025-09-15	2025-09-22	1.361071296	1.315346868
E2	Tuesday	2025-09-16	2025-09-23	1.754980822	1.117164054
E2	Wednesday	2025-09-17	2025-09-24	1.327739891	1.760463255
S2	Friday	2025-09-12	2025-09-19	1.744861048	1.753496689

Table A4. Sensitivity analyses for matched-pairs: Primary (all pairs) vs. Unique baseline (no reused interventions) vs. Leave-one-canteen-out (LOCO) re-estimation.

Category	Dropped Canteen	Number of Pairs	Hodges-Lehmann	Wilcoxon W	Wilcoxon p-Value (One-Sided Greater)	% Positive Pairs	Mean Difference
Primary	(none)	20	0.19	169	0.008	80.00%	0.17
Unique baseline	(none)	17	0.17	117	0.028	76.50%	0.14
LOCO	C5	17	0.14	116	0.032	76.50%	0.12
LOCO	D1	16	0.22	108	0.019	81.30%	0.18
LOCO	E1	14	0.12	76	0.077	71.40%	0.12
LOCO	E2	14	0.26	100	0.001	85.70%	0.25
LOCO	S2	19	0.2	153	0.009	84.20%	0.18

Appendix C

Figure A1. t-SNE embedding of 26-bin standardized daily curves, colored by the Step-1 k-means cluster. The embedding was provided for qualitative visualization only; axes are t-SNE dimensions and are not directly interpretable as physical units. (t-SNE: perplexity = 10, init = PCA, learning_rate = auto, seed = 42).

Appendix D

Table A5. Logs from Dashboard indicating badge typology, cosine similarity, timestamp, action, online message sent status.

Canteen	Date	Time	Similarity	Typology	Confident	Action	Message_Status
D1	2025-09-18	11:29	0.98	Early Peak	TRUE	redirect	success
E1	2025-09-18	11:29	0.97	Early Peak	TRUE	redirect	success
E2	2025-09-18	11:29	0.97	Early Peak	TRUE	redirect	success
C5	2025-09-19	11:29	0.99	Early Peak	TRUE	redirect	success
D1	2025-09-19	11:29	0.91	Early Peak	TRUE	redirect	success
E1	2025-09-19	11:29	0.99	Early Peak	TRUE	redirect	success
E2	2025-09-19	11:29	1.00	Early Peak	TRUE	redirect	success
S2	2025-09-19	11:29	1.00	Early Peak	TRUE	redirect	success
D1	2025-09-22	11:29	0.98	Early Peak	TRUE	redirect	success
E1	2025-09-22	11:29	1.00	Early Peak	TRUE	redirect	success
E2	2025-09-22	11:29	0.93	Early Peak	TRUE	redirect	success
C5	2025-09-23	11:29	1.00	Early Peak	TRUE	redirect	success
D1	2025-09-23	11:29	0.97	Early Peak	TRUE	redirect	success
E1	2025-09-23	11:29	0.98	Early Peak	TRUE	redirect	success
E2	2025-09-23	11:29	1.00	Early Peak	TRUE	redirect	success
E1	2025-09-24	11:29	1.00	Early Peak	TRUE	redirect	success
E2	2025-09-24	11:29	0.99	Early Peak	TRUE	redirect	success
C5	2025-09-26	11:29	0.91	Early Peak	TRUE	redirect	success
E1	2025-09-26	11:29	0.97	Early Peak	TRUE	redirect	success
E2	2025-09-26	11:29	0.99	Early Peak	TRUE	redirect	success

Table A6. Example of an online message sent via LMS to students on 18 September, at 11:29 am.

Subject: Lunch routing today (avoid 11:30–12:30)

Lunch routing for 2025-09-18

Avoid 11:30–12:30: D1, E1, E2

Prefer (late/noon): C5, S2

If your usual canteen is listed to avoid, please use one of the ‘Prefer’ canteens.

Thanks for helping keep lines short!

Table A7. Intervention components, timing, and compliance observability.

Component	Trigger/Timing	Allowed Action	Compliance Level (Observed?)
Dashboard badge	Triggered when prefix cosine ≥ τ at ~11:28	Early Peak badge displayed on dashboard and logged in csv file	Observed (system event/log)
LMS announcement	posted at ~11:29 after badge is known	Pre-written message encouraging staggering/deflection	Observed (posting recorded)
Student response	Before main surge	Delay departure or use alternative nearby canteen	Not measured (behavioral compliance unobserved)
On-site adjustments (optional)	11:30–12:30	Queue marshaling/ad hoc service adjustments	Not standardized/not logged

References

Gracias, J.S.; Parnell, G.S.; Specking, E.; Pohl, E.A.; Buchanan, R. Smart Cities—A Structured Literature Review. Smart Cities 2023, 6, 1719–1743. [Google Scholar] [CrossRef]
Syed, A.S.; Sierra-Sosa, D.; Kumar, A.; Elmaghraby, A. IoT in Smart Cities: A Survey of Technologies, Practices and Challenges. Smart Cities 2021, 4, 429–475. [Google Scholar] [CrossRef]
Wiangwiset, T.; Surawanitkun, C.; Wongsinlatam, W.; Remsungnen, T.; Siritaratiwat, A.; Srichan, C.; Thepparat, P.; Bunsuk, W.; Kaewchan, A.; Namvong, A.; et al. Design and Implementation of a Real-Time Crowd Monitoring System Based on Public Wi-Fi Infrastructure: A Case Study on the Sri Chiang Mai Smart City. Smart Cities 2023, 6, 987–1008. [Google Scholar] [CrossRef]
Zhao, Z.; Wang, T.; Zhang, Y.; Wang, Z.; Geng, R. Geo-Visualization of Spatial Occupancy on Smart Campus Using Wi-Fi Connection Log Data. ISPRS Int. J. Geo-Inf. 2023, 12, 455. [Google Scholar] [CrossRef]
Ivan, J. The Impact of the Constraints of Class Scheduling on Campus Dining: A Simulation-Based Case Study. In Proceedings of the 33rd European Modeling & Simulation Symposium 2021, Virtual Event, 15–17 September 2021; pp. 266–271. [Google Scholar] [CrossRef]
Kambli, A.; Sinha, A.A.; Srinivas, S. Improving Campus Dining Operations Using Capacity and Queue Management: A Simulation-Based Case Study. J. Hosp. Tour. Manag. 2020, 43, 62–70. [Google Scholar] [CrossRef]
Zhang, Y.; Hu, K.; Yang, Y.; Li, D.; Deng, T.; Hu, Z.; Xu, Y. Energy-Saving and Decarbonization Design Optimization for School Canteen Buildings: A Case Study in Nanjing. Buildings 2025, 15, 455. [Google Scholar] [CrossRef]
Alishahi, N. Analyzing WiFi Connection Counts in Commercial/Institutional Buildings to Estimate/Predict Occupancy Patterns for Optimizing Buildings’ Systems Operation. Doctoral Dissertation, Concordia University, Montreal, QC, Canada, 2021. [Google Scholar]
Alishahi, N.; Ouf, M.M.; Nik-Bakht, M. Using WiFi Connection Counts and Camera-Based Occupancy Counts to Estimate and Predict Building Occupancy. Energy Build. 2022, 257, 111759. [Google Scholar] [CrossRef]
Ouf, M.M.; Issa, M.H.; Azzouz, A.; Sadick, A.-M. Effectiveness of Using WiFi Technologies to Detect and Predict Building Occupancy. Sust. Build. 2017, 2, 7. [Google Scholar] [CrossRef]
Determe, J.-F.; Azzagnuni, S.; Singh, U.; Horlin, F.; Doncker, P.D. Monitoring Large Crowds with WiFi: A Privacy-Preserving Approach. IEEE Syst. J. 2022, 16, 2148–2159. [Google Scholar] [CrossRef]
Rafsanjani, H.N.; Ghahramani, A. Extracting Occupants’ Energy-Use Patterns from Wi-Fi Networks in Office Buildings. J. Build. Eng. 2019, 26, 100864. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T.; Piette, M.A.; Pritoni, M. Inferring Occupant Counts from Wi-Fi Data in Buildings through Machine Learning. Build. Environ. 2019, 158, 281–294. [Google Scholar] [CrossRef]
Valks, B.; Arkesteijn, M.; Koutamanis, A.; Heijer, A.D. Towards Smart Campus Management: Defining Information Requirements for Decision Making through Dashboard Design. Buildings 2021, 11, 201. [Google Scholar] [CrossRef]
Akasiadis, C.; Kladis, E.; Kamberi, P.-F.; Michelioudakis, E.; Alevizos, E.; Artikis, A. A Framework to Evaluate Early Time-Series Classification Algorithms. In Proceedings of the 27th International Conference on Extending Database Technology, Paestum, Italy, 25–28 March 2024. [Google Scholar]
Aghabozorgi, S.; Seyed Shirkhorshidi, A.; Ying Wah, T. Time-Series Clustering—A Decade Review. Inf. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
Senin, P.; Malinchik, S. SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; pp. 1175–1180. [Google Scholar]
Zaidi, A.; Ahuja, R.; Shahabi, C. Differentially Private Occupancy Monitoring from WiFi Access Points. In Proceedings of the 2022 23rd IEEE International Conference on Mobile Data Management (MDM), Paphos, Cyprus, 6–9 June 2022; pp. 361–366. [Google Scholar]
Vega-Barbas, M.; Álvarez-Campana, M.; Rivera, D.; Sanz, M.; Berrocal, J. AFOROS: A Low-Cost Wi-Fi-Based Monitoring System for Estimating Occupancy of Public Spaces. Sensors 2021, 21, 3863. [Google Scholar] [CrossRef] [PubMed]
Doma, A.; Ouf, M.M.; Amara, F.; Morovat, N.; Athienitis, A.K. Occupancy-Informed Predictive Control Strategies for Enhancing the Energy Flexibility of Grid-Interactive Buildings. Energy Build. 2025, 332, 115388. [Google Scholar] [CrossRef]
Arreeras, T.; Arimura, M.; Asada, T.; Arreeras, S. Association Rule Mining Tourist-Attractive Destinations for the Sustainable Development of a Large Tourism Area in Hokkaido Using Wi-Fi Tracking Data. Sustainability 2019, 11, 3967. [Google Scholar] [CrossRef]
Dimara, A.; Krinidis, S.; Tzovaras, D. occupI: A Novel Non-Intrusive Occupancy Inference Tool. In Proceedings of the 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), Rhodes Island, Greece, 2–6 November 2020; pp. 392–398. [Google Scholar]
Alqahtani, A.; Ali, M.; Xie, X.; Jones, M.W. Deep Time-Series Clustering: A Review. Electronics 2021, 10, 3001. [Google Scholar] [CrossRef]
Paparrizos, J.; Gravano, L. K-Shape: Efficient and Accurate Clustering of Time Series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; ACM: Melbourne, Victoria, Australia, 2015; pp. 1855–1870. [Google Scholar]
Kim, J.; Song, K.; Lee, G.; Lee, S. Time-Series Data Clustering with Load-Shape Preservation for Identifying Residential Energy Consumption Behaviors. Energy Build. 2024, 311, 114130. [Google Scholar] [CrossRef]
Wang, J.; Zhu, X.; Mather, B. A Two-Step Time-Series Data Clustering Method for Building-Level Load Profile. In Proceedings of the 2023 IEEE Power & Energy Society General Meeting (PESGM), Orlando, FL, USA, 16–20 July 2023; IEEE: Orlando, FL, USA, 2023; pp. 1–5. [Google Scholar]
Michalakopoulos, V.; Sarmas, E.; Papias, I.; Skaloumpakas, P.; Marinakis, V.; Doukas, H. A Machine Learning-Based Framework for Clustering Residential Electricity Load Profiles to Enhance Demand Response Programs. Appl. Energy 2024, 361, 122943. [Google Scholar] [CrossRef]
Liu, X.; Zhang, S.; Wang, X.; Wu, R.; Yang, J.; Zhang, H.; Wu, J.; Li, Z. Clustering Method Comparison for Rural Occupant’s Behavior Based on Building Time-Series Energy Data. Buildings 2024, 14, 2491. [Google Scholar] [CrossRef]
Hassan, B.A.; Tayfor, N.B.; Hassan, A.A.; Ahmed, A.M.; Rashid, T.A.; Abdalla, N.N. From A-to-Z Review of Clustering Validation Indices. Neurocomputing 2024, 601, 128198. [Google Scholar] [CrossRef]
Pavlopoulos, J.; Vardakas, G.; Likas, A. Revisiting Silhouette Aggregation. In Proceedings of the Discovery Science: 27th International Conference, DS 2024, Pisa, Italy, 14–16 October 2024; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2025; pp. 354–368. [Google Scholar]
Ullmann, T.; Hennig, C.; Boulesteix, A.-L. Validation of Cluster Analysis Results on Validation Data: A Systematic Framework. WIREs Data Min. Knowl. Discov. 2022, 12, e1444. [Google Scholar] [CrossRef]
Iglesias Vázquez, F.; Zseby, T. Temporal Silhouette: Validation of Stream Clustering Robust to Concept Drift. Mach. Learn. 2024, 113, 2067–2091. [Google Scholar] [CrossRef]
Lee, M.-C.; Lin, J.-C.; Stolz, V. Evaluation of K-Means Time Series Clustering Based on Z-Normalization and NP-Free. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods, Rome, Italy, 24–26 February 2024; SCITEPRESS—Science and Technology Publications: Rome, Italy, 2024; pp. 469–477. [Google Scholar]
Tawakuli, A.; Havers, B.; Gulisano, V.; Kaiser, D.; Engel, T. Survey:Time-Series Data Preprocessing: A Survey and an Empirical Analysis. J. Eng. Res. 2025, 13, 674–711. [Google Scholar] [CrossRef]
Schäfer, P.; Leser, U. TEASER: Early and Accurate Time Series Classification. Data Min. Knowl. Discov. 2020, 34, 1336–1362. [Google Scholar] [CrossRef]
Bilski, J.M.; Jastrzębska, A. CALIMERA: A New Early Time Series Classification Method. Inf. Process. Manag. 2023, 60, 103465. [Google Scholar] [CrossRef]
Zhang, Y.-F.; Wen, Q.; Wang, X.; Chen, W.; Sun, L.; Zhang, Z.; Wang, L.; Jin, R.; Tan, T. OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023. [Google Scholar]
Liu, Y.; Li, C.; Wang, J.; Long, M. Koopa: Learning Non-Stationary Time Series Dynamics with Koopman Predictors. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023. [Google Scholar]
Liu, Z.; Cheng, M.; Li, Z.; Huang, Z.; Liu, Q.; Xie, Y.; Chen, E. Adaptive Normalization for Non-Stationary Time Series Forecasting: A Temporal Slice Perspective. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023. [Google Scholar]
Tcholtchev, N.; Schieferdecker, I. Sustainable and Reliable Information and Communication Technology for Resilient Smart Cities. Smart Cities 2021, 4, 156–176. [Google Scholar] [CrossRef]
Kim, M.-L.; Park, K.-J.; Son, S.-Y. Occupancy-Based Energy Consumption Estimation Improvement through Deep Learning. Sensors 2023, 23, 2127. [Google Scholar] [CrossRef]
Arthur, D.; Vassilvitskii, S. K-Means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, LA, USA, 7–9 January 2007. [Google Scholar]
Bachem, O.; Lucic, M.; Hassani, H.; Krause, A. Fast and Provably Good Seedings for K-Means. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Halkidi, M.; Batistakis, Y.; Vazirgiannis, M. On Clustering Validation Techniques. J. Intell. Inf. Syst. 2001, 17, 107–145. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Tibshirani, R.; Walther, G. Cluster Validation by Prediction Strength. J. Comput. Graph. Stat. 2005, 14, 511–528. [Google Scholar] [CrossRef]
Chicco, G. Overview and Performance Assessment of the Clustering Methods for Electrical Load Pattern Grouping. Energy 2012, 42, 68–80. [Google Scholar] [CrossRef]
Li, K.; Ma, Z.; Robinson, D.; Ma, J. Identification of Typical Building Daily Electricity Usage Profiles Using Gaussian Mixture Model-Based Clustering and Hierarchical Clustering. Appl. Energy 2018, 231, 331–342. [Google Scholar] [CrossRef]
Hollander, M.; Wolfe, D.A.; Chicken, E. The One-Sample Location Problem. In Nonparametric Statistical Methods; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2015; pp. 39–114. ISBN 978-1-119-19603-7. [Google Scholar]
Chernick, M.R. Bootstrap Methods: A Guide for Practitioners and Researchers; Wiley-Interscience: Hoboken, NJ, USA, 2008; ISBN 978-0-471-75621-7. [Google Scholar]
Banjanovic, E.S.; Osborne, J.W. Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling. Pract. Assess. Res. Eval. 2016, 21, 5. [Google Scholar] [CrossRef]
Herbei, R.; Wegkamp, M.H. Classification with Reject Option. Can. J. Stat. 2006, 34, 709–721. [Google Scholar] [CrossRef]
Wilcoxon, F. Individual Comparisons by Ranking Methods. In Breakthroughs in Statistics: Methodology and Distribution; Kotz, S., Johnson, N.L., Eds.; Springer: New York, NY, USA, 1992; pp. 196–202. ISBN 978-1-4612-4380-9. [Google Scholar]
Gumus, S.; Bubou, G.M.; Oladeinde, M.H. Application of Queuing Theory to a Fast Food Outfit: A Study of Blue Meadows Restaurant. Indep. J. Manag. Prod. 2017, 8, 441–458. [Google Scholar] [CrossRef]
Wansink, B.; Cao, Y.; Saini, P.; Shimizu, M.; Just, D.R. College Cafeteria Snack Food Purchases Become Less Healthy with Each Passing Week of the Semester. Public Health Nutr. 2013, 16, 1291–1295. [Google Scholar] [CrossRef] [PubMed]
Benjamin-Chung, J.; Arnold, B.F.; Berger, D.; Luby, S.P.; Miguel, E.; Colford, J.M., Jr.; Hubbard, A.E. Spillover Effects in Epidemiology: Parameters, Study Designs and Methodological Considerations. Int. J. Epidemiol. 2018, 47, 332–347. [Google Scholar] [CrossRef]
Lancaster, G.A.; Thabane, L. Guidelines for Reporting Non-Randomised Pilot and Feasibility Studies. Pilot Feasibility Stud. 2019, 5, 114. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overview of methodological pipeline (Preprocessing → Atlas training → Same-day prefix match (11:28) → Pairing → Wilcoxon/HL).

Figure 2. Typology atlas of Early Peak and Late Shift.

Figure 3. Peak comparison of matched pairs (non-intervention vs. intervention).

Table 1. Canteen capacity and number of APs covering the canteen area.

Canteen	Capacity	Number of APs
C5	280	3
D1	720	10
E1	390	4
E2	440	4
S2	150	2

Table 2. PS scores at each k.

k	PS_mean	PS_std
2	0.785	0.156
3	0.440	0.052
4	0.329	0.197

Table 3. Confusion matrix of Early Peak on hold-out days.

True/Predicted	Early Peak	Late Shift
Early Peak	18 (94.7%)	1 (5.3%)
Late Shift	2 (66.7%)	1 (33.3%)

Note. Rows = true class; columns = predicted class.

Table 4. Sensitivity check at various cosine similarity thresholds (τ).

(τ)	Decided Days D	Coverage Days D/30	Early Peak Predictions (TP + FP)	Correct Early Peak Predictions (TP)	Precision TP/(TP + FP)
0.85	23	0.77	20	18	0.90
0.90	22	0.73	20	18	0.90
0.95	18	0.60	17	16	0.94

Note. True Positive (TP) and False Positive (FP).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Variskhanov, A.; Arreeras, T. Early Peak Badges from Wi-Fi Telemetry: A Field Feasibility Study of Lunchtime Crowd Management on a Smart Campus. Urban Sci. 2026, 10, 29. https://doi.org/10.3390/urbansci10010029

AMA Style

Variskhanov A, Arreeras T. Early Peak Badges from Wi-Fi Telemetry: A Field Feasibility Study of Lunchtime Crowd Management on a Smart Campus. Urban Science. 2026; 10(1):29. https://doi.org/10.3390/urbansci10010029

Chicago/Turabian Style

Variskhanov, Anvar, and Tosporn Arreeras. 2026. "Early Peak Badges from Wi-Fi Telemetry: A Field Feasibility Study of Lunchtime Crowd Management on a Smart Campus" Urban Science 10, no. 1: 29. https://doi.org/10.3390/urbansci10010029

APA Style

Variskhanov, A., & Arreeras, T. (2026). Early Peak Badges from Wi-Fi Telemetry: A Field Feasibility Study of Lunchtime Crowd Management on a Smart Campus. Urban Science, 10(1), 29. https://doi.org/10.3390/urbansci10010029

Article Menu

Early Peak Badges from Wi-Fi Telemetry: A Field Feasibility Study of Lunchtime Crowd Management on a Smart Campus

Abstract

1. Introduction

2. Related Work

2.1. Wi-Fi Access Point Telemetry as an Occupancy Proxy

2.2. Time-Series Clustering of Daily Load Shapes

2.3. Validating and Stabilizing Clusters

2.4. Early Time-Series Classification (ETSC) and Prefix Decisions

2.5. Clustering-Then-Early Matching vs. Forecasting

3. Materials and Methods

3.1. Methods Overview

3.2. Setting and Data

3.3. Preprocessing and Representation

3.4. Typology Discovery and Prefix-Based Matching

3.5. Temporal Hold-Out Protocol

3.6. Statistical Analysis

4. Results

4.1. Dataset and Evaluation Split

4.2. Clustering and Internal Fit

4.3. Temporal Hold-Out Stability

4.4. Eligibility and Pairing

4.5. Primary Outcome (PeakProxy)

5. Discussion

5.1. Managerial Implications for Service Operations

5.2. Interpretability and Maintainability

5.3. Scope, Limitations, and External Validity

5.4. Operationalization and Policy Integration

5.5. Smart-City Implications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI