Keke-Aware Vehicle Counting for Traffic Measurement Using YOLO: Dataset and Field Evaluation

Akujobi, Moses U.; Abubakar, Abdulhameed U.; Mailabari, Raphael J.; Thuku, Iliya T.; Musa, Saidu Y.; Visa, Ibrahim M.; Abioye, Ayodeji O.

doi:10.3390/app16094316

Open AccessArticle

Keke-Aware Vehicle Counting for Traffic Measurement Using YOLO: Dataset and Field Evaluation

by

Moses U. Akujobi

^1,†

,

Abdulhameed U. Abubakar

²

,

Raphael J. Mailabari

³

,

Iliya T. Thuku

¹,

Saidu Y. Musa

¹,

Ibrahim M. Visa

¹ and

Ayodeji O. Abioye

^1,4,*,†

¹

Department of Electrical and Electronics Engineering, Modibbo Adama University, Yola 640001, Nigeria

²

Department of Civil Engineering, Modibbo Adama University, Yola 640001, Nigeria

³

Department of Mechanical Engineering, Modibbo Adama University, Yola 640001, Nigeria

⁴

School of Computing and Communications, The Open University, Milton Keynes MK7 6AA, UK

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2026, 16(9), 4316; https://doi.org/10.3390/app16094316

Submission received: 30 March 2026 / Revised: 22 April 2026 / Accepted: 24 April 2026 / Published: 28 April 2026

(This article belongs to the Section Transportation and Future Mobility)

Download

Browse Figures

Versions Notes

Abstract

Accurate vehicle counts from traffic videos are fundamental to traffic measurement and to estimating roadway demand for infrastructure planning and maintenance. However, many vision-based traffic datasets and pretrained models under-represent vehicle types that are prevalent in developing countries, such as the keke (globally known as auto-rickshaw/three-wheeler), which can bias traffic composition estimates and downstream workload indicators. This paper presents a keke-aware vehicle detection and counting pipeline that combines fine-tuned YOLO-based detectors with BoT-SORT/ByteTrack tracking and ROI-based counting, together with a newly curated and publicly released traffic-video dataset that includes a dedicated keke class. The detectors are fine-tuned from pretrained weights on a six-class dataset (bicycle, bus, car, motorcycle, truck, keke) and evaluated on held-out roadside test videos with a manual counting baseline. On the validation split (2088 images; 8400 instances), the fine-tuned YOLO11l model achieves

P = 0.752

,

R = 0.696

,

mAP @ 0.5 = 0.766

, and

mAP @ 0.5 : 0.95 = 0.578

, with the keke class attaining

mAP @ 0.5 = 0.772

, while YOLO26l achieves slightly higher overall precision (

P = 0.766

) and stronger keke recall and

mAP @ 0.5 : 0.95

. In system-level counting, the selected tuned ROI-based variants produce the most reliable results on the Yola Road downward flow, where keke counts remain close to the manual baseline, but performance is strongly direction- and scene-dependent, with substantially larger errors in the Yola upward flow and the more challenging Mubi Road scene. Flow-rate and ESAL-rate analyses further show that class misclassification can severely distort pavement-loading estimates even when total traffic flow appears close to baseline, underscoring the need for localized class ontologies and robust heavy-vehicle discrimination in mixed-traffic ITS deployments. The released dataset and baseline pipeline provide a practical reference for keke-aware traffic monitoring and for infrastructure-relevant traffic measurement in developing-country contexts.

Keywords:

traffic flow measurement; vehicle counting; YOLO; keke (auto-rickshaw); traffic-video dataset

1. Introduction

Reliable vehicle counts and class-specific traffic volumes are foundational to intelligent transportation systems and to evidence-based roadway planning, from capacity analysis and safety interventions to pavement design and maintenance scheduling. Yet in many developing countries, traffic monitoring remains constrained by manual surveys or limited-purpose sensors that are costly to deploy at scale and often struggle in dense, heterogeneous flow. A key challenge is that locally prevalent vehicle types, often absent or under-represented in widely used datasets and pretrained vision models [1], can be systematically misdetected or merged into coarse categories, reducing the accuracy of traffic measurement precisely where data scarcity is already a problem; this gap is especially visible for keke. In Nigeria, the three-wheeled vehicle known as keke—or more formally as keke napep—is a ubiquitous form of urban paratransit. Globally, this vehicle is referred to as an auto-rickshaw (or simply “three-wheeler”), a motorized rickshaw designed to carry three to four passengers for short-distance, low-speed trips. It has distinct size, appearance, and motion patterns and frequently operates in mixed traffic alongside motorcycles, minibuses, passenger cars, and heavy vehicles.

Recent advances in deep learning-based object detection have made video analytics a practical option for traffic monitoring, with YOLO-family models offering strong performance-speed tradeoffs suitable for near real-time deployment. However, detection alone is insufficient for reliable counting in unconstrained road scenes with occlusions and frequent interactions; robust counting typically requires a detection–tracking pipeline [2] that preserves object identities over time. In this work, we present a keke-aware vehicle counting system for traffic measurement that combines YOLO-based detection with multi-object tracking and ROI-based counting to support consistent counts at road cross-sections under real-world conditions. The approach is explicitly designed for mixed-traffic environments common in developing countries, where vehicle morphology varies widely, and lane discipline may be weak, making conventional counting assumptions fragile. Our field evaluation focuses on operational roadside video from two held-out scenes and emphasizes performance on the keke class while also reporting aggregate and class-wise counting behavior across the broader vehicle set.

To address the under-representation of kekes and similar classes in public traffic datasets, we also release a new annotated traffic-video dataset [3] that includes a dedicated keke class and reflects realistic roadway conditions in a developing-country context. The dataset and experiments are intended to provide a practical baseline for inclusive video-based traffic sensing: (i) training YOLO-based detectors to recognize keke reliably, (ii) integrating identity-preserving tracking and ROI-based filtering for counting, and (iii) validating performance on real road traffic scenes under different flow directions and scene conditions. By improving the fidelity of class-specific counts in mixed traffic, especially for vehicles that are often ignored by standard taxonomies, this study supports more accurate traffic volume estimation and, by extension, better-informed infrastructure planning and maintenance decisions in Nigeria and comparable settings.

2. Related Work

2.1. Conventional Vehicle Counting

Automated traffic monitoring has historically relied on fixed roadside or in-pavement sensors such as inductive loops, radar, infrared, and ultrasonic detectors. These systems can provide reliable point measurements, but they are often intrusive to install (especially in-pavement sensors), require lane closures for deployment and maintenance, and may degrade under harsh environmental conditions or infrastructure wear [4]. In many settings, especially across developing countries, traffic surveys are still conducted with manual tallies or short-duration counts due to cost, logistics, and limited coverage of installed sensors [5]. While manual and point-sensor approaches remain widely used, they can struggle to capture mixed and heterogeneous traffic streams (e.g., frequent lane changes, irregular headways, non-standard vehicle shapes).

Before deep learning became mainstream, video-based traffic monitoring commonly employed background subtraction and handcrafted visual features. Background modeling methods, such as Mixture-of-Gaussians (MoG) and Gaussian Mixture Model (GMM), were widely used to segment moving objects from static scenes [6,7]. Counting was then performed using heuristics based on predefined regions (virtual loops) or line-crossing rules (virtual detection lines), emulating physical loop detectors while avoiding road intrusion [8]. These classic pipelines are attractive for their simplicity and low cost; however, they are sensitive to illumination changes, shadows, camera vibration, heavy congestion, and partial occlusions, which can cause fragmented blobs and merged targets, ultimately degrading classification and count reliability [9].

2.2. Vision-Based Vehicle Counting

Vehicle counting from video is typically formulated in two broad ways: (i) virtual detection area methods that infer vehicle passage by changes in a defined region (e.g., occupancy of a loop-like zone), and (ii) tracking-based methods that detect vehicles and maintain identities over time, updating counts when trajectories cross a counting line or region boundary [9]. Detection-zone approaches can be computationally efficient and practical when traffic follows a stable lane structure, but they often miscount under lane changes, side-by-side motion, and dense mixed traffic where multiple vehicles simultaneously occupy the same region [9]. In contrast, tracking-based methods explicitly address temporal continuity by linking detections across frames, enabling robust counting under intermittent missed detections and short occlusions [2,10].

Recent comparative studies in traffic surveillance often report that the accuracy of the overall counting pipeline is strongly influenced by both the detector quality and the tracker’s association robustness, particularly under occlusion and low-light conditions [2,5]. Consequently, modern systems increasingly adopt a tracking-by-detection paradigm: a high-quality object detector generates per-frame bounding boxes and class scores, while a multi-object tracker performs state estimation and data association to produce consistent track IDs, which are then used to trigger line-crossing count events [11,12,13]. This paradigm is well aligned with operational roadway scenes, where vehicles frequently interact, overlap, and reappear after short-term occlusions.

2.3. YOLO-Based Vehicle Detection

Deep learning-based object detection has largely displaced handcrafted-feature pipelines due to its ability to learn robust representations directly from data and to generalize better across viewpoints and appearance variations [5]. Two-stage detectors (e.g., Faster R-CNN) offer strong accuracy by generating region proposals and refining them, but their computational cost can be limiting for edge or real-time deployments [14]. One-stage detectors, by contrast, predict bounding boxes and class probabilities in a single forward pass, making them attractive for real-time traffic analytics [15,16].

Among one-stage methods, the YOLO family has become particularly popular in transportation applications due to its favorable accuracy–speed trade-off and ease of integration into end-to-end pipelines. Successive YOLO variants introduced architectural improvements (e.g., multi-scale feature aggregation, optimized backbones/necks, improved label assignment strategies) and training “bag-of-freebies” techniques that improve accuracy without sacrificing throughput [17,18,19]. In traffic surveillance contexts, YOLO-based detectors have been used for vehicle detection, classification, and downstream tasks such as counting, speed estimation, and lane-level analytics [20]. Recent advancements in remote sensing, such as the MCViM-YOLO framework, which leverages the YOLOv12 backbone and Mamba-based state-space models to achieve high precision in UAV-based vehicle detection (mAP@0.5 of 92.39%) [21]. YDFNet integrates the YOLOv10 algorithm for fast feature extraction, the BiFPN (Bidirectional Feature Pyramid Network) for multi-scale feature fusion, and the DETR (Detection Transformer) architecture for global feature modeling for small vehicle target detection [22]. Nevertheless, performance still depends heavily on the representativeness of the training data and on robustness to real-world degradations (illumination changes, weather, motion blur, and severe occlusion) that are often under-modeled in curated benchmarks [5,23].

2.4. Multi-Object Tracking

Multi-object tracking (MOT) remains challenging under occlusions, abrupt motion, and appearance ambiguity, motivating a large body of work on online association and state estimation [13]. In practice, many video-analytics systems adopt lightweight online trackers due to deployment constraints. SORT introduced a simple yet effective online formulation using Kalman filtering and Hungarian matching on detection boxes [11]. Deep SORT extended SORT by incorporating appearance embeddings to reduce identity switches during occlusion and close-proximity interactions [12]. More recent trackers improve association robustness by leveraging low-confidence detections and more resilient matching strategies; ByteTrack, for example, associates nearly all detections (including low-score boxes) to recover trajectories through occlusion and reduce fragmentation [24].

BoT-SORT further advances tracking-by-detection by combining motion and appearance cues with camera-motion compensation and an improved Kalman filter state representation, achieving strong performance on standard MOT benchmarks [25]. These properties are especially relevant for roadside traffic videos where perspective effects, ego-motion (for non-fixed cameras), and frequent occlusions can break naive association rules. In vehicle counting, robust association reduces double-counting and missed counts caused by fragmented trajectories and ID switches, particularly near the counting line, where occlusion and overlap are common [2]. As a result, YOLO+MOT pipelines (e.g., YOLO with Deep SORT, ByteTrack, or BoT-SORT) are now a common pattern for operational traffic measurement systems.

2.5. Traffic Datasets

The performance of deep models is tightly coupled to dataset coverage. Widely used generic datasets such as PASCAL VOC and MS COCO have driven detector development and transfer learning, but their object taxonomies and scene distributions are not tailored to traffic measurement in mixed road environments [26,27]. Even within driving datasets (e.g., BDD100K, nuScenes), the dominant capture settings, infrastructure assumptions (lane structure, signage), and vehicle fleets are often shaped by a limited set of geographies [28,29]. While these resources have accelerated progress, they may still under-represent informal or region-specific vehicle categories that are prevalent in many developing countries (e.g., kekes, animal carts, locally modified minibuses), and which can be merged into coarse “car” or “motorcycle” labels when training detectors for traffic measurement.

To address unstructured and heterogeneous traffic environments, several datasets and studies explicitly target developing-country contexts. The India Driving Dataset (IDD) expands label sets and captures unstructured roadway scenes that violate common “structured driving” assumptions, highlighting the need for locally grounded taxonomies [30]. Follow-on efforts and complementary datasets for Indian traffic similarly note that standard benchmarks often omit classes such as rickshaws and other region-specific road users [31,32]. Related work in Bangladesh also emphasizes “native vehicle” recognition and the scarcity of region-specific detection datasets [33,34]. Collectively, these studies underscore an important gap in intelligent transportation systems (ITS) deployments: accurate traffic measurement requires not only high-performing detectors and trackers, but also datasets and class ontologies that reflect local fleets and mixed-traffic behavior.

This paper contributes to practical traffic measurement in developing-country contexts by (i) releasing a traffic-video dataset that explicitly includes a dedicated keke (auto-rickshaw/three-wheeler) class, and (ii) presenting a keke-aware vehicle counting pipeline that combines YOLO-based detection with a modern tracking-by-detection approach (BoT-SORT) and evaluates performance on real-world roadside traffic videos. In doing so, we aim to bridge the gap between state-of-the-art real-time detection/tracking methods and the representational needs of mixed-traffic environments where region-specific vehicles are common and under-modeled in mainstream benchmarks.

3. Materials and Methods

We propose a keke-aware vehicle counting pipeline for traffic measurement that combines (i) YOLO-based object detectors fine-tuned to recognize a dedicated keke class, and (ii) multi-object tracking (BoT-SORT and ByteTrack) to maintain consistent identities over time and support robust counting at a road cross-section. An ROI-based counting stage is applied to reduce spurious counts outside the measurement zone. The methodology comprises four stages: dataset construction and annotation, detector training and validation, video inference with tracking, and count/traffic-demand analysis using both model outputs and a manually produced baseline.

3.1. Dataset

To obtain broad coverage of common vehicle categories, we curated a vehicle-only subset from the MS COCO 2017 train/validation images containing instances of the following classes: bicycle, car, motorcycle, bus, truck. COCO is widely used for pretraining and transfer learning in object detection. This subset contains 19,230 images (note that an image can include multiple classes). Because keke is not a standard category in common international benchmarks, we assembled additional keke imagery from two sources: online images and our captured local road video footage. A total of 487 images were sourced from publicly available web datasets (e.g., keke collections hosted on platforms such as Kaggle). We annotated keke and other vehicle instances in these images. Our road videos were recorded at three cross-sections in Jimeta, Yola (Adamawa State, Nigeria): Mubi Roundabout, Police Roundabout, and Jimeta Main Market cross-section. Videos were converted into image frames using OpenCV; near-duplicate frames were removed; and keke and other visible vehicle classes were annotated. After deduplication, 704 labeled images were retained.

Combining the online and local sources yielded 1257 keke images containing 3963 keke annotations. These were merged with the COCO vehicle subset to produce a final dataset of 20,487 images spanning six classes: bicycle, car, motorcycle, bus, truck, keke. Table 1 shows a summary of the data class coverage. For each class, the total number of images containing at least one instance of that vehicle class, and the total number of that vehicle class annotated in the dataset. The final dataset was split into 18,399 (90%) training images and 2088 (10%) validation images. All annotations were created and managed using Roboflow and exported in YOLO-compatible format.

3.2. Training

The fine-tuned YOLO11l and YOLO26l detectors were trained under the same configuration to ensure a controlled comparison. Both models used pretrained initialization and were trained for 100 epochs with batch size 16, input size

640 \times 640

, seed 0, deterministic execution, AMP enabled, and 8 workers. The optimizer was specified as Ultralytics automatic selection (optimizer=auto), with

l r_{0} = 0.01

,

l r f = 0.01

, momentum

0.937

, weight decay

5 \times 10^{- 4}

, and a 3-epoch warm-up. Loss weights were 7.5 for box regression, 0.5 for classification, and 1.5 for distribution focal loss. The augmentation policy included HSV perturbation, translation 0.1, scale 0.5, horizontal flip 0.5, mosaic 1.0, and random erasing 0.4, with mosaic closed in the last 10 epochs; rotation, shear, perspective, vertical flip, mixup, cutmix, and copy-paste were disabled. Validation was performed on the held-out validation split each epoch, and best/last checkpoints were saved, with early stopping patience set to 100 epochs.

Model training was performed on a high-performance computing (HPC) workstation with an Intel Core i9-14900K CPU, 64 GB DDR5 RAM, NVIDIA GeForce RTX 4070 Ti Super GPU with a 16 GB GDDR6X VRAM, running the Ubuntu 24.04 LTS Operating System.

3.3. Inference

Vehicle counting test and traffic-flow evaluation were performed using two held-out roadside videos contained in the dataset test split: the 10-min yola_road.mp4 video and the 7-min mubi_road.mp4 video. Neither video was used during training or validation. The same videos are used across the evaluated pipeline variants to enable a paired comparison under identical scene conditions. Figure 1 shows a tuned-model detection on a sample frame from the Yola Road test video, while Figure 2 shows a corresponding sample from the Mubi Road test video.

While benchmark studies indicate that YOLO-based detector performance across specific vehicle categories can vary significantly depending on class distribution [35], performance in unconstrained Nigerian road scenes is additionally sensitive to occlusions and close vehicle interactions. To account for this variability, we adopted a tracking-by-detection design in which per-frame detections are associated with trajectories using BoT-SORT or ByteTrack. We use Ultralytics track() mode with botsort.yaml or bytetrack.yaml and fixed confidence (conf=0.25) and IoU (iou=0.45) thresholds.

Counting is performed at a road cross-section represented by a rectangular region-of-interest (ROI) in image coordinates, shown as a yellow rectangle in Figure 1 and Figure 2. Let

R

be the polygonal ROI. For each tracked object j at frame t, we compute the centroid of its bounding box

(c_{x}, c_{y})

and determine membership using a point-in-polygon test. We define a count event as the start of a track segment when an object enters

R

:

enter (j) = min {t ∣ (c_{x} (t), c_{y} (t)) \in R \land (c_{x} (t - 1), c_{y} (t - 1)) \notin R}

(1)

An exit event is analogously defined when the centroid leaves

R

. Each (enter, exit) pair forms a region segment with a dwell time duration. The implementation stores, per segment, the entry/exit timestamps, entry/exit centroid positions, and confidence statistics aggregated while inside the ROI.

To support directional flow analysis, each counted segment is assigned a direction using the sign of vertical centroid displacement between entry and exit:

Δ y = y_{exit} - y_{enter},

(2)

with

Δ y < 0

indicating upward flow and

Δ y > 0

indicating downward flow (with a separate “stationary” category when

Δ y = 0

). This logic is used consistently in our analysis plots.

To reduce spurious tracks and stationary artifacts, we apply post-filters to region segments: minimum dwell time (duration

\geq 0.15

s) to remove noisy object tracks, maximum dwell time (duration

\leq 10

s) to remove long stationary objects in ROI, and a confidence filter (mean in-region confidence

\geq 0.4

). These filters are applied identically across the evaluated pipeline variants.

To obtain a baseline for system-level performance, a human annotator manually counted the number of vehicles entering the ROI in the same Yola Road and Mubi Road videos. Counts were logged every 0.2 s (5 Hz) with the fields: timestamp, vehicle class, travel direction. This serves as the reference for comparing the counting outputs of the evaluated pipeline variants.

3.4. Evaluation

3.4.1. Model Performance

Detection performance was measured on the validation split using standard object detection metrics (precision, recall, and mean Average Precision, including COCO-style mAP across IoU thresholds) as reported by the YOLO training framework.

For counting performance on the test videos, we compared the selected tuned ROI-based pipeline variant counts against the manual baseline using:

(a): absolute count error per class as,

$e_{i} = |{\hat{N}}_{i} - N_{i}|$

(3)

where $N_{i}$ is the baseline count for class i in a given flow direction and ${\hat{N}}_{i}$ is the corresponding model output for a selected tuned ROI-based pipeline variant.
(b): mean absolute error,

$MAE = \frac{1}{K} \sum_{i = 1}^{K} e_{i}$

(4)
(c): root mean square error for aggregate errors across the four classes,

$RMSE = \sqrt{\frac{1}{K} \sum_{i = 1}^{K} {({\hat{N}}_{i} - N_{i})}^{2}}$

(5)
(d): mean absolute percentage error,

$MAPE = \frac{100}{| I |} \sum_{i \in I} \frac{e_{i}}{N_{i}}$

(6)
(e): and weighted absolute percentage error,

$WAPE = \frac{\sum_{i = 1}^{K} | {\hat{N}}_{i} - N_{i} |}{\sum_{i = 1}^{K} | N_{i} |}$

(7)

where $I = {i : N_{i} > 0}$ (i.e., classes with non-zero baseline to avoid division by zero), and K is the number of classes. We also analyze temporal stability via cumulative count curves and per-class distributions, generated from the exported region-segment CSV outputs.

3.4.2. Road Usage

To connect vehicle counts to interpretable transportation engineering quantities, we conducted the following standardized measurements.

Flow Rate (veh/h)

For an observation window of length T seconds, the per-class flow rate is:

q_{i} = \frac{3600}{T} N_{i} [veh / h],

(8)

and total flow,

Q = \sum_{i} q_{i}

(9)

This provides a direct and comparable measure across baseline, default, and tuned models.

Equivalent Single Axle-Load Rate

Traffic loading is commonly represented via Equivalent Single Axle Load (ESAL). Equivalent single axle-load rate (ESAL/h) for pavement loading is used for pavement design and maintenance planning, typically using the 18-kip (80 kN or 8.16 tons) standard axle concept. Given class-specific axle/load equivalency factors

{LEF}_{i}

,

{LEF}_{i} = {(\frac{Axle load}{Standard reference load})}^{4}

(10)

where the Standard reference load is

8.16

tons. The ESAL rate can be computed as:

R_{ESAL} = \sum_{i} (\frac{3600}{T} N_{i}) \cdot {LEF}_{i} [ESAL / h]

(11)

The training, inference, and evaluation codes are publicly available on the GitHub Repository, https://github.com/abioyeayo/Keke-Aware-Vehicle-Counting, accessed on 23 April 2026, and the full annotated dataset (including the keke class) is published on Figshare [3]. The counting/evaluation pipeline exports per-frame detections and ROI segments to CSV and JSON for transparent auditing and replication.

4. Results

The fine-tuned YOLO11l and YOLO26l models were each trained for 100 epochs, initialized with YOLO11l pretrained weights, and the final validation was performed using the corresponding best checkpoint (best.pt). We present the results of detector training and validation first, followed by the ablation WAPE analysis, and then system-level counting performance evaluation for the selected tuned ROI-based pipeline variants. The latter analysis covers the held-out Yola Road and Mubi Road test videos and includes class-wise counting errors, aggregate error metrics, temporal count behavior, and downstream road-usage indicators such as flow rate and ESAL rate.

4.1. Model Evaluation

4.1.1. Training Result

Figure 3 and Figure 4 summarize the learning dynamics of the fine-tuned YOLO11l detector across 100 epochs. For both models, all three optimization losses (box regression, classification, and distribution focal loss) decrease consistently on the training split, indicating progressively improved localization and class discrimination. The validation losses exhibit a sharp reduction within the first few epochs and then stabilize, suggesting rapid convergence to a well-generalizing solution. Importantly, the absence of a widening gap between training and validation losses in the late epochs indicates no obvious divergence or unstable fitting behavior under the chosen training configuration. In parallel, the validation metrics improve in the expected manner: precision rises quickly and stabilizes in mid-to-late training, while recall increases more gradually toward the final epochs. The mAP curves follow the same trend, with both

mAP @ 0.5

and

mAP @ 0.5 : 0.95

increasing rapidly early in training and then plateauing, reflecting diminishing returns once the models reach stable optima.

The comparison in Table 2 shows that YOLO26l achieved slightly higher precision for all classes and improved keke recall and

mAP @ 0.5 : 0.95

, whereas YOLO11l retained slightly higher overall recall and stronger

mAP

for several common vehicle classes, especially bus, car, motorcycle, and truck.

4.1.2. Model Validation

Detection performance is evaluated on the held-out validation split using standard metrics reported by the Ultralytics YOLO training pipeline: precision (P), recall (R), mean Average Precision at IoU 0.5 (

mAP @ 0.5

), and COCO-style mean Average Precision averaged over IoU thresholds 0.5:0.95 (

mAP @ 0.5 : 0.95

). On our validation set (2088 images; 8400 labelled instances), the fine-tuned YOLO11l model achieved an overall

P = 0.752

,

R = 0.696

,

mAP @ 0.5 = 0.766

, and

mAP @ 0.5 : 0.95 = 0.578

, while the fine-tuned YOLO26l model achieved

P = 0.766

,

R = 0.693

,

mAP @ 0.5 = 0.757

, and

mAP @ 0.5 : 0.95 = 0.571

. Per-class performance is reported in Table 2, with particular emphasis on the proposed keke class, for which both detectors achieve competitive localization quality and recall (YOLO11l:

P = 0.649

,

R = 0.732

,

mAP @ 0.5 = 0.772

,

mAP @ 0.5 : 0.95 = 0.685

; YOLO26l:

P = 0.653

,

R = 0.780

,

mAP @ 0.5 = 0.770

,

mAP @ 0.5 : 0.95 = 0.688

).

The class-wise results in Table 2 show performance differences that are consistent with both dataset composition and scene-level visual complexity. The bus class achieves the strongest scores for both detectors, which is expected because buses are comparatively large objects with distinctive shapes and less ambiguity against background clutter, making them easier to localize and classify. By contrast, truck exhibits the lowest recall and lower

mAP @ 0.5 : 0.95

values, likely reflecting greater intra-class variability (e.g., pickups, vans, articulated and cargo trucks), frequent partial occlusions in dense traffic, and truncation at image boundaries, all of which can reduce detection confidence and increase localization errors. The car class shows moderate performance despite having the most instances, which may be explained by fine-grained appearance diversity (sedans, SUVs, taxis, varying viewpoints) and heavy overlap with adjacent vehicles in congested scenes. Notably, the proposed keke class attains competitive localization quality and relatively high recall in both detectors, indicating that the added class can be learned effectively despite having fewer validation samples (66 images; 209 instances). YOLO26l provides slightly stronger keke recall and

mAP @ 0.5 : 0.95

, whereas YOLO11l remains slightly stronger on several of the more common vehicle classes.

The normalized confusion matrix, shown in Figure 5 and Figure 6, indicates strong diagonal dominance across most classes for both fine-tuned detectors. For YOLO11l, correct-class prediction rates of

0.84

(bus),

0.76

(keke),

0.74

(motorcycle),

0.71

(car), and

0.69

(bicycle), and the overall pattern is broadly consistent with the class-wise validation metrics in Table 2. The main failure mode is missed detections, reflected by the background row (true objects predicted as background), which is most pronounced for bicycle, truck, car, and motorcycle; consistent with small targets, partial occlusions, and cluttered scenes. Among inter-class confusions, truck instances being predicted as cars remain notable, likely due to intra-class variability in trucks (e.g., pickups/vans vs heavy trucks). On the false-positive side (true background column), predictions are dominated by car and truck, suggesting that background structures or partial vehicle fragments are occasionally interpreted as vehicles. The YOLO26l confusion matrix shows a similar structure, with strong class recovery overall and behavior consistent with its slightly higher precision and stronger keke recall.

4.2. Ablation Analysis

We evaluated multiple counting pipeline configurations formed by combining detector variant (default or tuned), tracker (BoT-SORT or ByteTrack), and counting-region mode (ROI or full frame).

We use the Weighted Absolute Percentage Error (WAPE) to compare each detection combination pipeline. It measures total absolute error relative to the total true volume. Figure 7 shows a heatmap of the overall counting error across vehicle classes, computed using Equation (7) for each scene and travel direction. Lower WAPE values indicate better agreement with the manual baseline.

For example, applying the YOLO11l tuned detector + BoT-SORT + ROI counting combination pipeline to the downward traffic flow of the yola_road test video yielded: bus = 1, car = 129, keke = 186, and truck = 2. Given the baseline count was: bus = 6, car = 108, keke = 181, and truck = 23; We compute the absolute count error per class as

e = (| 1 - 6 |, | 129 - 108 |, | 186 - 181 |, | 2 - 23 |) = (5, 21, 5, 21)

and the weighted absolute percentage error as

WAPE = \frac{5 + 21 + 5 + 21}{6 + 108 + 181 + 23} = \frac{52}{318} = 0.1635

The

0.1635

WAPE value obtained means the total WAPE is about 16.35% of the total true traffic volume. Figure 7 shows the heatmap and computed values for the 32 experiment runs consisting of 8 detection pipeline (2 detectors × 2 trackers × 2 counting regions) × 4 test conditions (2 test video scenes × 2 directions). However, WAPE can be dominated by frequent classes like cars or keke, so we report per-class errors of the best-performing pipeline variants, along with keke-specific metrics. Therefore, for subsequent analysis, we focus on the following four pipeline variants:

P1: YOLO11l tuned + BoT-SORT + ROI
P2: YOLO11l tuned + ByteTrack + ROI
P3: YOLO26l tuned + BoT-SORT + ROI
P4: YOLO26l tuned + ByteTrack + ROI

4.3. Performance Evaluation

For this evaluation, we focus on four vehicle classes (bus, car, keke, truck). The remaining classes (bicycle and motorcycle) were rarely observed in the test videos and, more importantly, have comparatively small contributions to pavement loading relative to heavier vehicles; excluding them reduces noise in the workload-oriented analysis while preserving the dominant traffic demand signals. Counts produced by the higher-performing pipeline variants were compared against a manual baseline created by human annotation within the same ROI and time window. Figure 8 shows a comparison of the manual baseline counts with the counting by the selected pipeline variants (P1–P4).

4.3.1. Vehicle Counting Analysis

For the Yola Road downward panel, which represents the easier daytime and high-visibility scene, all four selected pipeline variants recover the dominant car and keke flows reasonably well, but the YOLO11l-based ROI variants remain the closest to the baseline. In particular, YOLO11l + BoT-SORT + ROI predicts 129 cars and 186 keke against manual counts of 108 and 181, respectively, while YOLO11l + ByteTrack + ROI predicts 127 cars and 187 keke. The YOLO26l ROI variants are still competitive but show slightly larger deviations in this panel, especially through complete bus misses and larger keke under-counting for the ByteTrack version. This pattern is also reflected in the aggregate downward-flow error summaries, where the Yola downward overall WAPE is lower for the YOLO11l ROI variants than for the corresponding YOLO26l ROI variants.

The Yola Road upward panel reveals the main directional weakness of the counting system. Here, all four variants depart substantially from the manual distribution, primarily through car over-counting and keke under-counting. YOLO11l + BoT-SORT + ROI predicts 241 cars and only 79 keke relative to manual counts of 106 cars and 206 keke, while YOLO11l + ByteTrack + ROI moderates this imbalance to 219 cars and 93 keke. The YOLO26l variants do not remove the asymmetry: YOLO26l + BoT-SORT + ROI produces 45 buses, 209 cars, and 38 keke, and YOLO26l + ByteTrack + ROI produces 50 buses, 194 cars, and 41 keke, again relative to a baseline of zero buses, 106 cars, and 206 keke. Consistent with the grouped bars, the Yola Road upward overall WAPE remains high for all selected ROI variants, ranging from about 0.74 for YOLO11l + ByteTrack + ROI to about 0.97 for YOLO26l + BoT-SORT + ROI.

In the Mubi Road panels, which correspond to evening operation under cloudier, lower-visibility, and more congested conditions, the grouped bars show substantially larger departures from the manual baseline. The YOLO11l variants exhibit severe distortion of the class composition, especially through large car over-counting and strong keke under-counting. For example, YOLO11l + ByteTrack + ROI predicts 125 cars and only 20 keke in downward flow, compared with manual counts of 35 cars and 156 keke, and in upward flow it predicts 114 cars and 23 keke, compared with manual counts of 17 cars and 120 keke. The YOLO26l variants remain imperfect, but they visibly reduce this distortion: YOLO26l + ByteTrack + ROI predicts 65 cars and 63 keke in downward flow, and 40 cars and 44 keke in upward flow, bringing the class distribution closer to the baseline than the YOLO11l-based alternatives. This improvement is also reflected in the summary metrics, where the Mubi upward overall WAPE decreases from 1.45 for YOLO11l + BoT-SORT + ROI and 1.39 for YOLO11l + ByteTrack + ROI to 0.85 for YOLO26l + BoT-SORT + ROI and 0.81 for YOLO26l + ByteTrack + ROI.

Overall, the selected grouped-bar figure supports three conclusions. First, ROI-based tuned counting is viable in the easier Yola Road downward setting, where all four selected variants broadly reproduce the dominant class composition. Second, the strongest empirical limitation remains direction-dependent instability, particularly for upward flow, where rear-view similarity and tracking fragmentation lead to persistent confusion between keke and car. Third, in the more difficult Mubi Road scene, the YOLO26l-based variants provide a more favorable trade-off than the YOLO11l-based variants, especially when coupled with ByteTrack, but they still fall short of reliable class-balanced counting under adverse field conditions. These observations show that the proposed framework has practical value as a calibrated, keke-aware traffic composition measurement tool for roadside field studies, especially in settings where local vehicle mix is more informative than aggregate flow alone. Accordingly, the contribution of this work is not a claim of universally robust counting, but the introduction and validation of a reproducible composition-aware counting framework that can support transport analysis, reveal failure modes, and serve as a foundation for subsequent calibration and deployment studies.

4.3.2. Error Analysis

Using the class-wise totals from the downward and upward traffic-flow distribution plots (Figure 8), we quantify counting accuracy relative to the manual baseline for the four dominant classes (bus, car, keke, truck). For the downward traffic flow of the Yola Road test video (baseline: bus = 6, car = 108, keke = 181, truck = 23), the tuned P1 (YOLO11l + BoT-SORT + ROI) pipeline variant predicts (1, 129, 186, 2). We compute the absolute count error per class as

e = (| 1 - 6 |, | 129 - 108 |, | 186 - 181 |, | 2 - 23 |) = (5, 21, 5, 21)

mean absolute error as

MAE = \frac{5 + 21 + 5 + 21}{4} = 13.00

root mean square error as

\begin{matrix} RMSE & = \sqrt{\frac{5^{2} + 21^{2} + 5^{2} + 21^{2}}{4}} \\ = \sqrt{\frac{932}{4}} = \sqrt{233} = 15.26 \end{matrix}

and mean absolute percentage error

MAPE = \frac{100}{4} (\frac{5}{6} + \frac{21}{108} + \frac{5}{181} + \frac{21}{23}) = 49.21 %

The same procedure is applied to the remaining selected tuned ROI-based variants for both traffic directions in the Yola Road and Mubi Road test videos. Table 3 and Table 4 report the resulting predicted counts, absolute errors, and APE values for P1 (11L-BoT-ROI), P2 (11L-Byte-ROI), P3 (26L-BoT-ROI), and P4 (26L-Byte-ROI).

For the Yola Road test video, the smallest relative error is achieved on keke in the downward flow (Table 3) for 11L-BoT-ROI (APE = 2.76%). Across all the vehicle classes in the downward flow, keke APE remains low (2.76–9.39%), car APE remains moderate (17.59–23.15%), whereas heavy vehicles and low-count categories show much larger percentage deviations (truck APE = 86.96–91.30% and bus APE = 83.33–100.00%). In the upward flow (Table 3), car APE ranges from 83.02% to 127.36%, keke APE from 54.85% to 81.55%, and truck APE from 25.00% to 85.00%. Among these, 26L-Byte-ROI gives the lowest upward car APE (83.02%), 11L-Byte-ROI gives the lowest upward keke APE (54.85%), and 26L-BoT-ROI gives the lowest upward truck APE (25.00%). For the bus class in the upward flow direction, the manual baseline is zero, so APE is undefined for all four counting pipeline variants.

Table 4 shows substantially larger deviations for the Mubi Road test video. In the downward flow, keke APE ranges from 59.62% to 87.18% and car APE from 85.71% to 288.57%; the lowest values in both cases are obtained by 26L-Byte-ROI. In the upward flow, keke APE remains high (63.33–83.33%), and car APE remains very high (135.29–588.24%), again with 26L-Byte-ROI giving the lowest car and keke APEs. Truck is the only class for which some Mubi upward variants attain comparatively lower error, with APE = 20.00% for both 11L-Byte-ROI and 26L-BoT-ROI; however, downward truck errors remain extreme for all variants (1400.00–2000.00%). Also, the upward bus APE is undefined because the manual baseline is zero.

Overall, the new class-wise tables show that the selected tuned ROI-based pipelines are much more reliable on the Yola Road downward stream than on the Yola Road upward stream or either Mubi Road stream. The most stable result across the four road-direction scenarios is downward keke counting on Yola Road, whereas the most severe failure modes occur in the Mubi video and in low-count categories such as bus and truck, where small absolute differences lead to very large percentage errors. These results suggest the need for further keke-aware tuning to address the scene and direction-dependent failure modes through additional data collection and further refinement of the counting pipeline.

Table 5 shows that aggregate counting accuracy is strongest for the Yola Road downward flow, where all four selected tuned ROI-based pipeline variants remain within a narrow error band (MAE = 12.75–17.00, RMSE = 14.69–18.37, MAPE = 48.89–54.87%), with YOLO11l + ByteTrack + ROI giving the lowest overall error. For the Yola Road upward flow, errors increase substantially, although YOLO11l + ByteTrack + ROI again yields the lowest MAE and RMSE (61.75 and 80.25), while YOLO26l + BoT-SORT + ROI gives the lowest MAPE (67.91%). The Mubi Road video is markedly more challenging: downward-flow errors are high for all variants, with YOLO26l + ByteTrack + ROI achieving the lowest MAE and RMSE (46.00 and 53.44), but percentage errors remain very large because low-count classes strongly inflate MAPE. In the upward Mubi flow, the YOLO26l-based variants clearly outperform the YOLO11l-based variants, with YOLO26l + ByteTrack + ROI giving the best overall aggregate result (MAE = 28.75, RMSE = 40.33, MAPE = 79.54%). Overall, the table confirms strong scene and direction-dependence, with Yola Road downward being the most reliable setting and Mubi, especially downward, remaining the most difficult.

4.3.3. Flow-Rate and ESAL Analysis

For the road usage analysis, we computed the flow rate and ESAL rate to evaluate the implications for traffic measurement and pavement loading. The flow-rate analysis is computed class-wise using Equation (8) and then aggregated using Equation (9) to obtain the total flow. The estimated flow rates for the Yola and Mubi Road test videos were presented in Table 6 and Table 7, respectively. The YOLO11l-based variants report a similar total flow rate with the baseline, indicating that vehicle objects are effectively being detected, but the issues lie with the misclassification of the detected vehicle class. However, the YOLO26l variants deviate more significantly, especially in the Mubi Road upward traffic case.

The ESAL rate was computed using Equation (11). The LEF was computed using Equation (10). We assumed the axle load for the vehicle classes as follows: 0.4 tons for keke, 1.4 tons for car, 11 tons for truck, and 3.5 tons for bus (common bus types are minibuses). The standard reference load used was 8.16 tons. Therefore, for the Yola Road downward-flow traffic, the baseline keke

{LEF}_{b k}

was computed as,

{LEF}_{b k} = {(\frac{Axle load}{Standard reference load})}^{4} = {(\frac{0.4}{8.16})}^{4} = 5.77 \times 10^{- 6}

The baseline keke ESAL rate was computed as,

{R_{ESAL}}_{b k} = \sum_{i} (\frac{3600}{T} N_{i}) \cdot {LEF}_{i} = (\frac{3600}{10 \times 60} \times 181) \times 5.77 \times 10^{- 6} = 6.27 \times 10^{- 3} ESAL / h

Similarly, the individual ESAL rate was computed for all vehicle classes and aggregated for the baseline, and the four selected tuned ROI-based pipeline variants, with the results for each direction reported in Table 6 and Table 7. For the Yola Road downward flow, all selected variants produce total flow rates that remain close to the baseline (baseline

= 1908.0

veh/h; variants

= 1800.0

–

1908.0

veh/h), but their

R_{ESAL}

values are much smaller than the baseline

457.4969

ESAL/h because trucks are strongly under-counted: 11L-BoT-ROI gives

40.5072

, 11L-Byte-ROI gives

40.4968

, and both 26L variants give about

60.14

. For the Yola Road upward flow, total flow is again relatively close to the baseline (

1992.0

veh/h versus

1902.0

–

1986.0

veh/h), but class composition differs markedly across variants. The 11L variants under-count keke and over-count car, yielding lower

R_{ESAL}

values than the baseline

396.8284

ESAL/h (

61.9146

for 11L-BoT-ROI and

181.4941

for 11L-Byte-ROI), whereas the 26L variants overestimate heavier classes, especially bus and truck, producing inflated

R_{ESAL}

values of

505.5641

and

684.8231

.

A similar pattern is observed in the Mubi Road video, but with substantially larger distortions. In the downward flow, the total flow rate of all selected variants remains close to the baseline (

1662.9

veh/h versus

1645.7

–

1688.6

veh/h), yet all variants massively inflate

R_{ESAL}

relative to the baseline

57.1678

ESAL/h because of severe over-counting of heavy vehicles, especially trucks:

852.1921

for 11L-BoT-ROI,

1191.4804

for 11L-Byte-ROI,

972.5111

for 26L-BoT-ROI, and

999.5745

for 26L-Byte-ROI. In the upward flow, baseline

R_{ESAL}

is

141.6572

ESAL/h, while all four variants still overestimate pavement loading (

171.2581

–

256.1952

ESAL/h), although the inflation is smaller than in the downward case. Overall, these results highlight that total flow rate can remain numerically close to the manual baseline even when class composition is substantially wrong, while ESAL-rate is highly sensitive to errors in heavy-vehicle recognition. Consequently, although keke-aware tuning improves mixed-traffic representation, reliable pavement-loading inference additionally requires robust heavy-vehicle detection and careful calibration of class-specific load equivalency factors.

5. Discussion

In this section, we discuss the model performance, implications for road workload estimation and pavement loading, predictive maintenance scheduling, limitations, and future work.

5.1. Model Performance Implication

The validation results indicate that the fine-tuned YOLO11l model achieves strong overall detection performance with expected class-dependent variation. Large and visually distinctive vehicles (e.g., buses) are detected more consistently, whereas classes with higher intra-class variability or more frequent truncation/occlusion (e.g., trucks) exhibit lower recall. The normalized confusion matrix suggests that missed detections (objects predicted as background) and a limited set of inter-class confusions contribute most to error. These patterns are consistent with the downstream counting results and support the use of tracking-by-detection and ROI-based temporal filtering, which can reduce the impact of transient frame-level errors by enforcing temporal consistency of vehicle identities and suppressing short-lived false positives.

The road test evaluation shows that keke-aware fine-tuning is necessary for meaningful traffic composition estimation in the study setting. Across the selected tuned ROI-based pipeline variants (P1–P4), the most reliable results are obtained for the Yola Road downward flow, where keke counts remain close to the manual baseline and aggregate counting errors are comparatively low. In contrast, performance degrades in the Yola Road upward flow and more substantially in the Mubi Road video, indicating that scene condition, viewpoint, and traffic complexity strongly affect counting accuracy.

However, the same pipelines still undercount keke in several settings, especially in the Yola Road upward direction and in the Mubi scene, indicating direction and scene-dependent sensitivity. This asymmetry is plausibly attributable to the similarity in the rear view of different vehicle types and not having enough keke rear-view images for reliable distinction during training. From an ITS measurement standpoint, this finding motivates a more stringent evaluation practice: reporting bidirectional performance separately rather than only aggregate totals, and ensuring that training data cover both directions with comparable diversity. While the YOLO-based system demonstrates strong promise for detecting and classifying keke and other vehicles, the results suggest that its current reliability is highest in the Yola downward stream and remains less consistent for opposite-direction and more visually challenging scenes. In its present state, the model serves as a high-fidelity proof-of-concept for automated counting but should be viewed as a supplement to, rather than a replacement for, comprehensive manual surveys.

5.2. Road Workload Estimation

A major motivation for keke-aware counting is not merely reporting traffic volume but producing usable inputs for infrastructure planning and maintenance. The flow-rate and ESAL-rate analysis illustrates the practical consequences of misclassification for infrastructure-oriented metrics. While total flow can appear similar across systems, the allocation of flow into incorrect classes can severely bias load-based indicators. In particular, ESAL-rate estimates are highly sensitive to heavy-vehicle counts; therefore, even when total hourly flow remains close to the manual baseline, errors in truck and bus recognition can strongly distort pavement-loading estimates. This is evident in the Yola Road downward-flow case, all selected variants produce total flow rates close to baseline, yet all substantially underestimate

R_{ESAL}

because trucks are under-counted. For Yola Road upward flow, the 11L variants yield lower-than-baseline

R_{ESAL}

values, whereas the 26L variants inflate

R_{ESAL}

through higher bus and truck counts. The distortion is even more pronounced for Mubi Road, particularly in the downward flow, where all selected variants produce very large ESAL-rate overestimates because of severe heavy-vehicle over-counting. Thus, although keke-aware tuning improves mixed-traffic representation by explicitly accounting for a locally prevalent class, the results show that reliable pavement-loading inference still depends critically on robust heavy-vehicle discrimination and scene-specific calibration.

5.3. Maintenance Scheduling

Integrating the counting pipeline with load-based indicators provides a pathway from traffic monitoring to actionable maintenance planning. In a deployment setting, short-duration counts can be converted into hourly flow and combined with load equivalency factors to estimate an ESAL-rate proxy; cumulative ESAL can then be tracked over time against design or intervention thresholds. The comparative results emphasize that the utility of such a framework is contingent on classification fidelity: false-positive heavy-vehicle counts can dominate ESAL-rate, while missing or under-counting a prevalent local class yields incomplete demand representation. The results show that total flow alone is not a sufficient safeguard, because a pipeline may reproduce near-baseline hourly totals while still producing strongly biased class composition and, consequently, misleading load estimates. A practical next step is to build a longitudinal load profile for target corridors through repeated sampling across peak/off-peak periods, multiple days, and seasonal conditions, enabling maintenance triggers to be based on statistically grounded cumulative loading rather than sporadic manual surveys.

5.4. Limitations and Future Work

This study has several limitations. First, data were collected under a limited set of conditions (predominantly daytime, dry weather, and clear visibility); nighttime and adverse weather robustness were not assessed, and such conditions are operationally relevant in many developing-country settings. Second, the evaluation used a limited number of held-out test videos for system-level counting; while sufficient to reveal clear failure modes, broader validation across multiple sites and times would strengthen generalizability. Third, the tuned model’s bidirectional and cross-scene performance differs markedly for keke and heavy vehicles, suggesting that the training distribution may not fully represent viewpoint, illumination, congestion level, and lane-position variability across deployment settings. Also, class imbalance was not explicitly addressed in the present training setup; therefore, weaker performance on under-represented or visually variable classes may partly reflect the absence of imbalance-aware strategies such as class-weighted loss, targeted augmentation, or hard-example-focused sampling. Finally, ROI placement, confidence thresholds, and tracker association parameters influence counting error and may need site-specific calibration for reliable deployment.

Future work should priorities improving bidirectional robustness, particularly for the keke class and for heavy-vehicle discrimination under challenging scene conditions. This can be addressed by collecting additional data that balances direction, viewpoint, and scene difficulty, and by tuning operating thresholds using validation-derived criteria (e.g., near the F1-optimal region) combined with track-level filtering. A second direction is evaluation at scale: extending testing to multiple road typologies (arterials, markets, roundabouts) and multiple cities to quantify transferability. The comparison across YOLO11l/YOLO26l backbones and BoT-SORT/ByteTrack also suggests that detector–tracker selection should be treated as a deployment variable rather than a fixed design choice, since the best-performing configuration depends on flow direction and scene. Finally, strengthening the pavement-engineering linkage will require locally calibrated passenger-car equivalents (PCU/PCE) and axle-load equivalency factors (LEFs) for keke and regional truck types, enabling more accurate conversion from class counts to load-based maintenance indicators.

6. Conclusions

This paper presents a keke-aware vehicle-counting framework for video-based traffic measurement in mixed-traffic environments, together with a publicly released dataset that explicitly includes the keke (auto-rickshaw/three-wheeler) class. Building on fine-tuned YOLO-based detectors and ROI-based tracking-by-detection pipelines using BoT-SORT and ByteTrack, we evaluated both detection quality and system-level counting performance on a held-out roadside test video. The results show that keke-aware fine-tuning enables explicit keke counting in real traffic scenes and yields the most reliable measurements in the Yola Road downward flow, where keke counts across the selected tuned ROI-based variants remain close to the manual baseline and aggregate counting errors are comparatively low. This demonstrates that localized class extension and transfer learning can materially improve measurement fidelity in developing-country contexts.

Beyond raw counts, we examined flow-rate and ESAL-rate proxies to highlight the implications of class-level errors for infrastructure-focused analytics. The findings indicate that total flow alone can mask substantial class misallocations, and that pavement-loading indicators are highly sensitive to heavy-vehicle misclassification, even when total hourly flow remains close to the manual baseline. In our analysis, some variants substantially underestimate ESAL-rate through conservative truck detection, whereas others inflate ESAL-rate through over-counting of buses and trucks, especially in the more challenging Mubi Road scene. Thus, although the keke-aware framework improves traffic composition monitoring by restoring a locally important class, reliable downstream planning and maintenance analyses still depend on accurate heavy-vehicle discrimination.

At the same time, the evaluation revealed direction and scene-dependent performance, with larger keke and car counting errors in the Yola upward flow and broader class-composition distortions in the Mubi Road video. These limitations motivate future work on bidirectional and cross-scene robustness through viewpoint-balanced data collection, improved augmentation, and calibration of confidence, ROI, and tracking parameters. Overall, this study highlights practical considerations for ITS deployments in mixed-traffic regions: accurate traffic measurement requires not only modern detection and tracking models, but also representative datasets, locally appropriate class ontologies, and evaluation protocols that separately assess direction and scene difficulty. The released dataset and baseline pipeline provide a foundation for continued research toward inclusive, reliable, and infrastructure-relevant traffic monitoring in developing countries where keke are prevalent.

Author Contributions

Conceptualization, A.U.A., R.J.M., I.T.T. and A.O.A.; methodology, M.U.A., A.U.A., R.J.M. and A.O.A.; software, M.U.A. and A.O.A.; validation, M.U.A. and A.O.A.; formal analysis, M.U.A. and A.O.A.; investigation, M.U.A. and A.O.A.; resources, I.T.T., S.Y.M., I.M.V. and A.O.A.; data curation, M.U.A. and A.O.A.; writing—original draft preparation, M.U.A. and A.O.A.; writing—review and editing, M.U.A., A.U.A. and A.O.A.; visualization, M.U.A. and A.O.A.; supervision, I.T.T., S.Y.M., I.M.V. and A.O.A.; project administration, A.O.A.; funding acquisition, A.U.A., R.J.M. and A.O.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Nigerian Tertiary Education Trust Fund (TETFUND)—MAU 2021 IBR 7th Batch grant; and The Open University STEM-CC 2024/2025 Pump Priming research grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The training, inference, and evaluation codes are publicly available in the GitHub repository, https://github.com/abioyeayo/Keke-Aware-Vehicle-Counting, accessed on 23 April 2026. The keke-aware dataset is published on Figshare, https://doi.org/10.6084/m9.figshare.31968651.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ESAL	Equivalent Single Axle Load
LEF	Load Equivalency Factor
HPC	High-Performance Computer
ITS	Intelligent Transportation Systems
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
PCE	Passenger-Car Equivalent
RMSE	Root Mean Squared Error
ROI	Region of Interest
WAPE	Weighted Absolute Percentage Error

References

Tan, D.M.; Kieu, L.M. TRAMON: An automated traffic monitoring system for high density, mixed and lane-free traffic. IATSS Res. 2023, 47, 468–481. [Google Scholar] [CrossRef]
Mandal, V.; Adu-Gyamfi, Y. Object Detection and Tracking Algorithms for Vehicle Counting: A Comparative Analysis. J. Big Data Anal. Transp. 2020, 2, 251–261. [Google Scholar] [CrossRef]
Akujobi, M.; Abioye, A.O. Keke-Aware Vehicle Counting Dataset. 2026. Dataset (22 April 2026). Figshare. Available online: https://figshare.com/articles/dataset/Keke-Aware_Vehicle_Counting_Dataset/31968651 (accessed on 23 April 2026).
Klein, L.A.; Mills, M.K.; Gibson, D.R. Traffic Detector Handbook, 3rd ed.; Technical Report FHWA-HRT-06-108; Federal Highway Administration, Turner-Fairbank Highway Research Center: McLean, VA, USA, 2006.
Maity, M.; Banerjee, S.; Sinha Chaudhuri, S. Faster R-CNN and YOLO based Vehicle detection: A Survey. In Proceedings of the 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; pp. 1442–1447. [Google Scholar] [CrossRef]
Stauffer, C.; Grimson, W.E.L. Adaptive background mixture models for real-time tracking. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), Fort Collins, CO, USA, 23–25 June 1999; Volume 2, pp. 246–252. [Google Scholar] [CrossRef]
Zivkovic, Z. Improved adaptive Gaussian mixture model for background subtraction. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, Washington, DC, USA, 23–26 August 2004; Volume 2, pp. 28–31. [Google Scholar] [CrossRef]
Lai, A.H.S.; Yung, N.H.C. Vehicle-type identification through automated virtual loop assignment and block-based direction-biased motion estimation. IEEE Trans. Intell. Transp. Syst. 2000, 1, 86–97. [Google Scholar] [CrossRef]
Liu, F.; Zeng, Z.; Jiang, R. A video-based real-time adaptive vehicle-counting system for urban roads. PLoS ONE 2017, 12, e0186098. [Google Scholar] [CrossRef] [PubMed]
Lin, H.; Yuan, Z.; He, B.; Kuai, X.; Li, X.; Guo, R. A Deep Learning Framework for Video-Based Vehicle Counting. Front. Phys. 2022, 10, 829734. [Google Scholar] [CrossRef]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef]
Luo, W.; Xing, J.; Milan, A.; Zhang, X.; Liu, W.; Kim, T.K. Multiple object tracking: A literature review. Artif. Intell. 2021, 293, 103448. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, QU, Canada, 7–12 December 2015; NIPS’15. Volume 1, pp. 91–99. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Zhou, Y. A YOLO-NL object detector for real-time detection. Expert Syst. Appl. 2024, 238, 122256. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
Gheorghe, C.; Duguleana, M.; Boboc, R.G.; Postelnicu, C.C. Analyzing Real-Time Object Detection with YOLO Algorithm in Automotive Applications: A Review. CMES-Comput. Model. Eng. Sci. 2024, 141, 1939–1981. [Google Scholar] [CrossRef]
Zhang, K.; Zhu, N.; Zhao, F.; Zhang, Q. MCViM-YOLO: Remote Sensing Vehicle Detection for Sustainable Intelligent Transportation. Sustainability 2026, 18, 2836. [Google Scholar] [CrossRef]
Ji, A.; Ma, X. Vehicle detection and classification for traffic management and autonomous systems using YOLOv10. Alex. Eng. J. 2025, 127, 804–816. [Google Scholar] [CrossRef]
Liu, C.; Tao, Y.; Liang, J.; Li, K.; Chen, Y. Object Detection Based on YOLO Network. In Proceedings of the 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 14–16 December 2018; pp. 799–803. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-object Tracking by Associating Every Detection Box. In Proceedings of the Computer Vision–ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; pp. 1–21. [Google Scholar]
Aharon, N.; Orfaig, R.; Bobrovsky, B.Z. BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision–ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2633–2642. [Google Scholar] [CrossRef]
Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11618–11628. [Google Scholar] [CrossRef]
Varma, G.; Subramanian, A.; Namboodiri, A.; Chandraker, M.; Jawahar, C. IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1743–1751. [Google Scholar] [CrossRef]
Paranjape, B.A.; Naik, A.A. DATS_2022: A Versatile Indian Dataset for Object Detection in Unstructured Traffic Conditions. Data Brief 2022, 43, 108470. [Google Scholar] [CrossRef] [PubMed]
Dokania, S.; Hafez, A.H.A.; Subramanian, A.; Chandraker, M.; Jawahar, C.V. IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 4482–4491. [Google Scholar]
Hasan, M.M.; Wang, Z.; Hussain, M.A.I.; Fatima, K. Bangladeshi Native Vehicle Classification Based on Transfer Learning with Deep Convolutional Neural Network. Sensors 2021, 21, 7545. [Google Scholar] [CrossRef] [PubMed]
Saha, B.; Islam, M.J.; Mostaque, S.K.; Bhowmik, A.; Taton, T.K.; Chowdhury, M.N.H.; Reaz, M.B.I. Bangladeshi Native Vehicle Detection in Wild. arXiv 2024, arXiv:2405.12150. [Google Scholar] [CrossRef]
Gao, H. Vehicle Detection and Tracking Based on YOLOv11. In Proceedings of the 2nd International Conference on Data Science and Engineering, Lisbon, Portugal, 23–25 April 2025. [Google Scholar]

Figure 1. YOLO11l Tuned model detection on Yola Road test video (daytime, high visibility, moderate traffic condition) data showing track ID, class, and confidence level. The yellow rectangle is the vehicle counting region.

Figure 2. YOLO26l Tuned model detection on Mubi Road test video (evening, low visibility, dense traffic condition) data showing track ID, class, and confidence level. The yellow rectangle is the vehicle counting region.

Figure 3. Training history for the fine-tuned YOLO11l model over 100 epochs. Top row: training losses and validation precision/recall. Bottom row: validation losses and mAP metrics. Solid lines show per-epoch values; dashed lines show smoothed trends.

Figure 4. Training history for the fine-tuned YOLO26l model over 100 epochs. Top row: training losses and validation precision/recall. Bottom row: validation losses and mAP metrics. Solid lines show per-epoch values; dashed lines show smoothed trends.

Figure 5. Confusion Matrix of the trained YOLO11l model on the six vehicle classes.

Figure 6. Confusion Matrix of the trained YOLO26l model on the six vehicle classes.

Figure 7. Overall count WAPE by experiment, split by detector family. The lower the WAPE value, the lower the error, the better the performance. From the plot, tuned + tracker + ROI performed better.

Figure 8. Grouped bar charts comparing the total distribution of the vehicles counted per class for each test video and flow direction by each counting pipeline variant (P1–P4) against the manual baseline counts.

Table 1. Dataset class coverage (images containing class and instance annotations).

Class	# Images	# Annotations
bicycle	3304	7174
bus	4218	6419
car	11,808	41,496
keke	1257	3963
motorcycle	3660	9020
truck	6377	10,384

Table 2. Class-wise validation comparison of the fine-tuned YOLO11l and YOLO26l detectors on the same validation split (2088 images; 8400 instances). Metrics are precision (P), recall (R),

mAP @ 0.5

, and COCO-style

mAP @ 0.5 : 0.95

. The best value between detectors is shown in bold.

Table 2. Class-wise validation comparison of the fine-tuned YOLO11l and YOLO26l detectors on the same validation split (2088 images; 8400 instances). Metrics are precision (P), recall (R),

mAP @ 0.5

, and COCO-style

mAP @ 0.5 : 0.95

. The best value between detectors is shown in bold.

Class	Images	Inst.	YOLO11l				YOLO26l
Class	Images	Inst.	$P$	$R$	mAP@0.5	mAP@0.5:0.95	$P$	$R$	mAP@0.5	mAP@0.5:0.95
all	2088	8400	0.752	0.696	0.766	0.578	0.766	0.693	0.757	0.571
bicycle	459	940	0.767	0.623	0.706	0.461	0.788	0.629	0.709	0.459
bus	537	833	0.833	0.850	0.901	0.759	0.865	0.821	0.893	0.748
car	979	3756	0.753	0.698	0.751	0.521	0.759	0.681	0.746	0.517
keke	66	209	0.649	0.732	0.772	0.685	0.653	0.780	0.770	0.688
motorcycle	503	1206	0.809	0.704	0.785	0.538	0.819	0.701	0.781	0.536
truck	908	1456	0.702	0.567	0.683	0.502	0.711	0.546	0.643	0.480

Table 3. Class-wise counting errors for the Yola Road test video under the selected tuned ROI-based pipeline variants. APE is computed as

100 \cdot | \hat{N} - N | / N

and is undefined when

N = 0

.

Table 3. Class-wise counting errors for the Yola Road test video under the selected tuned ROI-based pipeline variants. APE is computed as

100 \cdot | \hat{N} - N | / N

and is undefined when

N = 0

.

Flow	Class	Baseline	11L-BoT-ROI			11L-Byte-ROI			26L-BoT-ROI			26L-Byte-ROI
		$N$	$\hat{N}$	$\| e \|$	APE (%)	$\hat{N}$	$\| e \|$	APE (%)	$\hat{N}$	$\| e \|$	APE (%)	$\hat{N}$	$\| e \|$	APE (%)
Downward	bus	6	1	5	83.33	1	5	83.33	0	6	100.00	0	6	100.00
	car	108	129	21	19.44	127	19	17.59	133	25	23.15	133	25	23.15
	keke	181	186	5	2.76	187	6	3.31	174	7	3.87	164	17	9.39
	truck	23	2	21	91.30	2	21	91.30	3	20	86.96	3	20	86.96
Upward	bus	0	6	6	–	10	10	–	45	45	–	50	50	–
	car	106	241	135	127.36	219	113	106.60	209	103	97.17	194	88	83.02
	keke	206	79	127	61.65	93	113	54.85	38	168	81.55	41	165	80.10
	truck	20	3	17	85.00	9	11	55.00	25	5	25.00	34	14	70.00

Table 4. Class-wise counting errors for the Mubi Road test video under the selected tuned ROI-based pipeline variants. APE is computed as

100 \cdot | \hat{N} - N | / N

and is undefined when

N = 0

.

Table 4. Class-wise counting errors for the Mubi Road test video under the selected tuned ROI-based pipeline variants. APE is computed as

100 \cdot | \hat{N} - N | / N

and is undefined when

N = 0

.

Flow	Class	Baseline	11L-BoT-ROI			11L-Byte-ROI			26L-BoT-ROI			26L-Byte-ROI
		$N$	$\hat{N}$	$\| e \|$	APE (%)	$\hat{N}$	$\| e \|$	APE (%)	$\hat{N}$	$\| e \|$	APE (%)	$\hat{N}$	$\| e \|$	APE (%)
Downward	bus	1	7	6	600.00	6	5	500.00	33	32	3200.00	29	28	2800.00
	car	35	136	101	288.57	125	90	257.14	76	41	117.14	65	30	85.71
	keke	156	20	136	87.18	20	136	87.18	54	102	65.38	63	93	59.62
	truck	2	30	28	1400.00	42	40	2000.00	34	32	1600.00	35	33	1650.00
Upward	bus	0	2	2	–	2	2	–	12	12	–	14	14	–
	car	17	117	100	588.24	114	97	570.59	44	27	158.82	40	23	135.29
	keke	120	20	100	83.33	23	97	80.83	40	80	66.67	44	76	63.33
	truck	5	9	4	80.00	6	1	20.00	6	1	20.00	7	2	40.00

Table 5. Aggregate counting error metrics relative to the manual baseline for the selected tuned ROI-based pipeline variants. MAE and RMSE are computed across the four dominant vehicle classes (bus, car, keke, truck). MAPE is computed over classes with non-zero baseline counts, so bus is excluded for upward flow.

Video	Flow	Pipeline	MAE (veh)	RMSE (veh)	MAPE (%)
Yola Road	Downward	11L-BoT-ROI	13.00	15.26	49.21
		11L-Byte-ROI	12.75	14.69	48.89
		26L-BoT-ROI	14.50	16.66	53.49
		26L-Byte-ROI	17.00	18.37	54.87
	Upward	11L-BoT-ROI	71.25	93.11	91.34
		11L-Byte-ROI	61.75	80.25	72.15
		26L-BoT-ROI	80.25	101.10	67.91
		26L-Byte-ROI	79.25	97.04	77.71
Mubi Road	Downward	11L-BoT-ROI	67.75	85.90	593.94
		11L-Byte-ROI	67.75	84.00	711.08
		26L-BoT-ROI	51.75	59.44	1245.63
		26L-Byte-ROI	46.00	53.44	1148.83
	Upward	11L-BoT-ROI	51.50	70.75	250.52
		11L-Byte-ROI	49.25	68.60	223.81
		26L-BoT-ROI	30.00	42.64	81.83
		26L-Byte-ROI	28.75	40.33	79.54

Table 6. Road-usage estimation for the Yola Road test video using flow-rate and ESAL-rate analysis for the selected tuned ROI-based pipeline variants. Flow rate is reported in veh/h and

R_{ESAL}

in ESAL/h.

Table 6. Road-usage estimation for the Yola Road test video using flow-rate and ESAL-rate analysis for the selected tuned ROI-based pipeline variants. Flow rate is reported in veh/h and

R_{ESAL}

in ESAL/h.

Metric	Direction	Pipeline	Keke	Bus	Car	Truck	Total
Flow rate (veh/h)	Downward	Baseline	1086.0	36.0	648.0	138.0	1908.0
		11L-BoT-ROI	1116.0	6.0	774.0	12.0	1908.0
		11L-Byte-ROI	1122.0	6.0	762.0	12.0	1902.0
		26L-BoT-ROI	1044.0	0.0	798.0	18.0	1860.0
		26L-Byte-ROI	984.0	0.0	798.0	18.0	1800.0
	Upward	Baseline	1236.0	0.0	636.0	120.0	1992.0
		11L-BoT-ROI	474.0	36.0	1446.0	18.0	1974.0
		11L-Byte-ROI	558.0	60.0	1314.0	54.0	1986.0
		26L-BoT-ROI	228.0	270.0	1254.0	150.0	1902.0
		26L-Byte-ROI	246.0	300.0	1164.0	204.0	1914.0
$R_{ESAL}$ (ESAL/h)	Downward	Baseline	0.0063	1.2185	0.5615	455.7107	457.4969
		11L-BoT-ROI	0.0064	0.2031	0.6706	39.6270	40.5072
		11L-Byte-ROI	0.0065	0.2031	0.6602	39.6270	40.4968
		26L-BoT-ROI	0.0060	0.0000	0.6914	59.4405	60.1380
		26L-Byte-ROI	0.0057	0.0000	0.6914	59.4405	60.1376
	Upward	Baseline	0.0071	0.0000	0.5511	396.2701	396.8284
		11L-BoT-ROI	0.0027	1.2185	1.2529	59.4405	61.9146
		11L-Byte-ROI	0.0032	2.0308	1.1385	178.3216	181.4941
		26L-BoT-ROI	0.0013	9.1385	1.0865	495.3377	505.5641
		26L-Byte-ROI	0.0014	10.1539	1.0086	673.6592	684.8231

Table 7. Road-usage estimation for the Mubi Road test video using flow-rate and ESAL-rate analysis for the selected tuned ROI-based pipeline variants. Flow rate is reported in veh/h and

R_{ESAL}

in ESAL/h.

Table 7. Road-usage estimation for the Mubi Road test video using flow-rate and ESAL-rate analysis for the selected tuned ROI-based pipeline variants. Flow rate is reported in veh/h and

R_{ESAL}

in ESAL/h.

Metric	Direction	Pipeline	Keke	Bus	Car	Truck	Total
Flow rate (veh/h)	Downward	Baseline	1337.1	8.6	300.0	17.1	1662.9
		11L-BoT-ROI	171.4	60.0	1165.7	257.1	1654.3
		11L-Byte-ROI	171.4	51.4	1071.4	360.0	1654.3
		26L-BoT-ROI	462.9	282.9	651.4	291.4	1688.6
		26L-Byte-ROI	540.0	248.6	557.1	300.0	1645.7
	Upward	Baseline	1028.6	0.0	145.7	42.9	1217.1
		11L-BoT-ROI	171.4	17.1	1002.9	77.1	1268.6
		11L-Byte-ROI	197.1	17.1	977.1	51.4	1242.9
		26L-BoT-ROI	342.9	102.9	377.1	51.4	874.3
		26L-Byte-ROI	377.1	120.0	342.9	60.0	900.0
$R_{ESAL}$ (ESAL/h)	Downward	Baseline	0.0077	0.2901	0.2599	56.6100	57.1678
		11L-BoT-ROI	0.0010	2.0308	1.0101	849.1503	852.1921
		11L-Byte-ROI	0.0010	1.7407	0.9284	1188.8104	1191.4804
		26L-BoT-ROI	0.0027	9.5737	0.5644	962.3703	972.5111
		26L-Byte-ROI	0.0031	8.4132	0.4827	990.6754	999.5745
	Upward	Baseline	0.0059	0.0000	0.1263	141.5251	141.6572
		11L-BoT-ROI	0.0010	0.5802	0.8689	254.7451	256.1952
		11L-Byte-ROI	0.0011	0.5802	0.8467	169.8301	171.2581
		26L-BoT-ROI	0.0020	3.4813	0.3268	169.8301	173.6402
		26L-Byte-ROI	0.0022	4.0616	0.2971	198.1351	202.4959

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Akujobi, M.U.; Abubakar, A.U.; Mailabari, R.J.; Thuku, I.T.; Musa, S.Y.; Visa, I.M.; Abioye, A.O. Keke-Aware Vehicle Counting for Traffic Measurement Using YOLO: Dataset and Field Evaluation. Appl. Sci. 2026, 16, 4316. https://doi.org/10.3390/app16094316

AMA Style

Akujobi MU, Abubakar AU, Mailabari RJ, Thuku IT, Musa SY, Visa IM, Abioye AO. Keke-Aware Vehicle Counting for Traffic Measurement Using YOLO: Dataset and Field Evaluation. Applied Sciences. 2026; 16(9):4316. https://doi.org/10.3390/app16094316

Chicago/Turabian Style

Akujobi, Moses U., Abdulhameed U. Abubakar, Raphael J. Mailabari, Iliya T. Thuku, Saidu Y. Musa, Ibrahim M. Visa, and Ayodeji O. Abioye. 2026. "Keke-Aware Vehicle Counting for Traffic Measurement Using YOLO: Dataset and Field Evaluation" Applied Sciences 16, no. 9: 4316. https://doi.org/10.3390/app16094316

APA Style

Akujobi, M. U., Abubakar, A. U., Mailabari, R. J., Thuku, I. T., Musa, S. Y., Visa, I. M., & Abioye, A. O. (2026). Keke-Aware Vehicle Counting for Traffic Measurement Using YOLO: Dataset and Field Evaluation. Applied Sciences, 16(9), 4316. https://doi.org/10.3390/app16094316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Keke-Aware Vehicle Counting for Traffic Measurement Using YOLO: Dataset and Field Evaluation

Abstract

1. Introduction

2. Related Work

2.1. Conventional Vehicle Counting

2.2. Vision-Based Vehicle Counting

2.3. YOLO-Based Vehicle Detection

2.4. Multi-Object Tracking

2.5. Traffic Datasets

3. Materials and Methods

3.1. Dataset

3.2. Training

3.3. Inference

3.4. Evaluation

3.4.1. Model Performance

3.4.2. Road Usage

Flow Rate (veh/h)

Equivalent Single Axle-Load Rate

4. Results

4.1. Model Evaluation

4.1.1. Training Result

4.1.2. Model Validation

4.2. Ablation Analysis

4.3. Performance Evaluation

4.3.1. Vehicle Counting Analysis

4.3.2. Error Analysis

4.3.3. Flow-Rate and ESAL Analysis

5. Discussion

5.1. Model Performance Implication

5.2. Road Workload Estimation

5.3. Maintenance Scheduling

5.4. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI