4.1. Datasets, Metrics, and Controlled Attribution Protocol
We evaluate the proposed method on two standard pedestrian multi-object tracking benchmarks from MOTChallenge: MOT17 and MOT20 [
27]. MOT17 covers pedestrian tracking sequences from multiple cameras and scenes, with substantial occlusion, interaction, and appearance ambiguity. MOT20 focuses on extremely crowded scenes and therefore places much stronger pressure on identity preservation under dense overlap, mutual occlusion, and local confusion. Evaluating both benchmarks allows us to examine not only the general effectiveness of HMP, but also its robustness as crowd density and association difficulty increase.
The main benchmark results are reported on the official MOTChallenge test sets. The ablation experiments are conducted on the MOT17 training set, where the first half is used for upstream detector/ReID training or calibration when needed, and the second half is reserved for validation. Taken together, these two evidence layers form the controlled attribution protocol of this paper: the official server results establish the external competitiveness of the complete tracker, whereas the controlled validation experiments isolate which parts of the proposed memory redesign are responsible for the observed gains. This protocol avoids repeated submissions to the public server and provides a controlled environment for analyzing individual design choices. Unless otherwise stated, all ablation results are obtained under identical detector, ReID, motion-model, and solver settings so that the observed differences can be attributed to the appearance-memory design itself.
The evaluation metrics are higher-order tracking accuracy (HOTA), multiple object tracking accuracy (MOTA), identity F1 score (IDF1), and identity switches (IDSW) [
28,
29,
30]. HOTA reflects overall detection and association quality, MOTA summarizes missed detections, false positives, and association errors, IDF1 focuses on identity consistency, and IDSW counts the number of identity switches. Unless otherwise specified, in all result tables, ↑ and ↓ indicate that higher and lower values are better, respectively, and bold values denote the best or tied-best result in each column. Because HMP primarily targets identity modeling rather than detector redesign, we place particular emphasis on IDF1 and IDSW while still reporting HOTA and MOTA for overall completeness.
4.2. Implementation Details and Fair Comparison Protocol
For the detector and ReID configuration, we distinguish clearly between the official benchmark submissions and the controlled ablations. For the benchmark results in
Table 2 and
Table 3, the upstream detector/ReID stack follows the public BoT-SORT-ReID-style configuration used for MOTChallenge submission: a YOLOX-X detector implemented with YOLOX version 0.3.0 and initialized from Common Objects in Context (COCO) pretraining [
31], together with an SBS-S50 ReID branch implemented with FastReID version 1.4.0 [
7]. The MOT17 detector is trained using the public pedestrian-tracking training mixture that includes MOT17, CityPersons, ETHZ, CrowdHuman, and WiderPerson [
27,
32,
33,
34]; the MOT20 detector follows the corresponding public MOT20 pedestrian-training setting. We describe these details to make the official submission setting transparent, but we do not treat the official tables as strict module-level attribution evidence. Accordingly, the official HMP submission should be understood as the complete HMP tracker built on top of this shared detector/ReID stack, rather than as the untouched BoT-SORT-ReID baseline with a single module toggled on. By contrast, in the controlled ablations under BoT-SORT-ReID and Deep OC-SORT, we insert HMP into the original reference frameworks while keeping the detector, ReID extractor, motion prediction/gating pipeline, one-to-one assignment primitive, and track life-cycle rules fixed. This design provides a tighter test of whether the observed gain comes from the proposed memory mechanism itself.
The method was implemented in Python 3.8. All controlled experiments were conducted on a workstation equipped with an NVIDIA GeForce RTX 3090 graphics processing unit (GPU) (NVIDIA Corporation, Santa Clara, CA, USA) and an Intel Core i7-10700 central processing unit (CPU) (Intel Corporation, Santa Clara, CA, USA).
For motion modeling, we follow the noise-scale-adaptive (NSA) Kalman filter and camera motion compensation (CMC) used in the public BoT-SORT-style tracking pipeline, together with Hungarian matching and the standard track initialization, confirmation, and termination procedures used in mainstream frameworks [
1,
25,
26]. HMP does not replace the detector, ReID extractor, motion prediction/gating pipeline, or the underlying one-to-one assignment primitive; it changes how appearance evidence is represented, admitted into memory, and scheduled across the two association stages.
Unless explicitly varied, all controlled experiments use the same default HMP configuration. The memory capacity is set to long-term prototypes and short-queue entries per track. The reliability-related parameters are , , , , , , , and . The Stage 2 control parameters are and . The prototype-maintenance parameters are , , , , and .
These values are not re-tuned for different host trackers in the controlled experiments or for the additional DanceTrack-val check, and the same HMP memory parameters are kept between the MOT17 and MOT20 official submissions. Only the benchmark-specific detector-training setting follows the corresponding public MOTChallenge practice. The default setting is therefore intended to represent a stable operating point rather than a globally optimal configuration for every tracker or dataset. In particular,
and
control the trade-off between long-memory purity and short-term adaptability,
bounds the speed of prototype movement, and
together with
controls how conservative Stage 2 residual recovery remains. The sensitivity analyses in
Section 4.4.6 further examine whether the observed identity gains depend on a narrowly tuned parameter setting.
Appendix A reports the same values together with the functional role of each key parameter, and
Appendix B provides a compact pseudocode summary of the HMP inference and memory-update flow.
To make the evaluation protocol explicit, we separate the experiments into two groups.
Table 2 and
Table 3 report the official test-set results and mainly serve as external positioning for the complete HMP tracker under the MOTChallenge protocol. Because the compared methods do not share identical end-to-end pipelines, these results should not be interpreted as strict module-level attribution evidence for the memory module alone. By contrast,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8,
Table 9,
Table 10 and
Table 11 provide controlled or diagnostic evidence under matched settings, where HMP is inserted into the reference framework while the upstream detector/ReID configuration, motion prediction/gating, assignment primitive, and track life-cycle rules are kept fixed within each comparison. Throughout this paper, these controlled experiments therefore constitute the primary evidence chain for the claim that redesigning track memory improves identity continuity under fixed upstream modules. Since HMP targets track-level identity modeling rather than detector quality, the following analysis emphasizes IDF1 and IDSW as the most direct indicators of identity continuity, while still reporting HOTA and MOTA to verify that the identity gains are not obtained by sacrificing overall tracking quality.
4.3. Benchmark Results on Official MOTChallenge Test Sets
All results in this subsection are taken directly from the official MOTChallenge evaluation server. The compared methods include FairMOT [
8], ByteTrack [
14], OC-SORT [
15], StrongSORT++ [
10], Deep OC-SORT [
16], BoT-SORT-ReID, and BoostTrack++ [
17]. Our official submission follows a BoT-SORT-ReID-style upstream detector/ReID configuration but uses the complete HMP tracker described in
Section 3 rather than the untouched original BoT-SORT-ReID pipeline. Accordingly, these tables are used mainly for external positioning of the complete system under the official benchmark protocol. They answer whether the full HMP tracker is externally competitive, especially on identity-oriented metrics. The stricter question of why the gain appears, and whether it can be attributed specifically to the proposed memory redesign, is addressed later by the controlled ablations under fixed upstream settings.
4.3.1. MOT17 Test
As shown in
Table 2, the complete HMP tracker reaches 66.6 HOTA, 81.0 MOTA, 82.6 IDF1, and 882 IDSW on the MOT17 test set. These numbers place the complete system among the competitive online trackers in this comparison, with particularly strong identity-oriented performance. Because the official benchmark compares complete systems rather than identical pipelines, the differences in
Table 2 should be interpreted as external positioning rather than strict module-level attribution.
In particular, the numerical gaps between HMP and other official submissions are not used here as evidence that these gains are caused solely by the HMP module, since the compared trackers may use different detectors, ReID extractors, training data, and engineering pipelines.
The result nevertheless shows that the complete HMP tracker remains competitive in overall metrics while obtaining a favorable IDF1/IDSW profile.
4.3.2. MOT20 Test
As shown in
Table 3, the complete HMP tracker obtains 65.5 HOTA, 77.5 MOTA, 80.8 IDF1, and 752 IDSW on the MOT20 test set. The most notable pattern is the low number of identity switches under extremely crowded conditions. We again interpret this result cautiously because the compared trackers use different end-to-end pipelines.
Therefore,
Table 3 is used only to position the complete HMP-based tracker under the official benchmark protocol, not to quantify the standalone contribution of HMP relative to trackers built on different upstream stacks.
The controlled experiments below are used instead to isolate the effect of the memory redesign itself.
Overall, the official test-set results show that the complete HMP tracker is externally competitive and has a favorable identity-oriented profile.
These official results provide benchmark-level context for the complete submitted system, while the module-level attribution is deliberately based on the controlled ablations under fixed upstream settings.
This observation motivates the controlled ablations below, where we separate more cleanly the contributions of memory structure, writing policy, and stage-specific evidence usage under fixed upstream settings.
4.4. Ablation and Sensitivity Analysis
The ablation study is organized to validate the contributions of HMP and to characterize their practical operating range. Concretely, eight questions are examined: (1) whether reliability-controlled writing improves the baseline memory state, (2) whether multi-prototype long memory provides additional stable identity modeling, (3) whether short-queue residual recovery contributes beyond the long memory, (4) whether the frozen Stage 1 policy and Stage 2 risk controls are necessary, (5) whether the observed benefits persist across different tracking frameworks, (6) whether the pure module gain remains visible under fixed upstream pipelines, (7) whether the method shows cross-dataset transferability beyond MOT17/MOT20, and (8) how sensitive the method is to memory capacity, reliability weights, and admission thresholds, as well as what runtime and memory-footprint overhead HMP introduces. The purpose of this subsection is therefore not merely to show that HMP improves benchmark numbers, but to identify which part of the proposed memory lifecycle is responsible for that improvement. The controlled results thus form a direct claim-to-evidence chain:
Table 4 and
Table 5 provide stepwise component attribution for reliability-controlled writing, multi-prototype long memory, and short-queue recovery;
Table 6 isolates the frozen association policy and residual-recovery constraints;
Table 7 summarizes the pure memory-module gain under fixed host pipelines;
Table 8 provides additional cross-dataset validation on DanceTrack-val;
Figure 4 and
Figure 5 together with
Table 9 and
Table 10 examine parameter sensitivity; and
Table 11 quantifies runtime overhead and the subsequent feature-state memory analysis under matched GPU-workstation settings. The qualitative analysis in
Section 4.5 further links the quantitative IDF1/IDSW changes to occlusion and reappearance stages.
4.4.1. Stepwise Component Ablation
To make the contribution of each component explicit, we report a stepwise component ablation under the same BoT-SORT-ReID pipeline. The Baseline uses the original single-prototype EMA-style appearance memory and does not use the proposed reliability-controlled writing or short queue. B1 adds the proposed reliability-controlled writing while keeping the single-prototype representation unchanged. A1 then replaces the single prototype with the multi-prototype long-term memory under the same reliability-controlled writing rule. A2 further introduces the short queue and risk-controlled Stage 2 residual recovery, resulting in the complete HMP configuration. Therefore, the transition from Baseline to Baseline + B1 measures the effect of reliability-controlled writing, the transition from Baseline + B1 to Baseline + B1 + A1 measures the effect of multi-prototype long-memory representation, and the transition from Baseline + B1 + A1 to Baseline + B1 + A1 + A2 measures the incremental effect of controlled short-term residual recovery.
The progression in
Table 4 provides a clearer attribution chain for the proposed design. Adding B1 to the Baseline increases IDF1 from 81.8 to 82.1 and reduces IDSW from 135 to 129, while MOTA remains unchanged. Since the memory capacity and association schedule are unchanged in this comparison, the improvement can be attributed to reliability-controlled memory writing rather than to larger storage or an additional matching stage.
Adding A1 on top of B1 further improves IDF1 to 82.5 and reduces IDSW to 125. This indicates that, once unreliable observations are suppressed by the reliability-controlled writing rule, replacing the single prototype with a compact multi-prototype long memory further alleviates over-smoothing and representation drift. In other words, A1 mainly improves the coverage and stability of long-term identity anchors.
Finally, adding A2 yields the full HMP configuration, with MOTA remaining stable at 78.5, IDF1 further increasing to 82.7, and IDSW decreasing to 121. This shows that the short queue is valuable not as another long-term memory bank, but as a tightly constrained source of transitional evidence for residual recovery. Relative to the Baseline, the complete HMP configuration improves MOTA by 0.1, improves IDF1 by 0.9, and reduces IDSW by 14, or 10.4%, while keeping HOTA competitive. Overall, these results support the intended division of labor in HMP: B1 controls memory contamination, A1 strengthens stable long-term identity modeling, and A2 selectively recovers difficult residual cases without destabilizing the primary association stage.
4.4.2. Cross-Framework Ablation
We further conduct experiments under the Deep OC-SORT framework to test whether the effect of HMP depends on a particular tracker implementation. Unless otherwise specified, the definitions of B1, A1, and A2 are kept the same as those under BoT-SORT-ReID; only the host tracking framework is changed. To maintain a consistent attribution protocol, we again report a stepwise component ablation rather than separating structural and writing effects into disconnected tables.
As shown in
Table 5, the same component-level trend is observed under Deep OC-SORT. Adding B1 to the Baseline increases IDF1 from 82.5 to 82.8 and reduces IDSW from 187 to 176, while MOTA and HOTA remain stable. Since the memory representation and association schedule are unchanged in this comparison, this improvement again supports the value of reliability-controlled memory writing for suppressing contaminated updates.
Adding A1 on top of B1 further improves IDF1 to 83.0 and HOTA to 70.8, while reducing IDSW to 175. This indicates that the multi-prototype long memory remains beneficial under a different host tracker by providing richer and more stable long-term identity anchors. Finally, adding A2 yields the complete HMP configuration, reaching 80.0 MOTA, 83.1 IDF1, 70.9 HOTA, and 173 IDSW. The additional reduction in IDSW shows that short-queue-based residual recovery is still useful when it is constrained by the same frozen two-stage policy.
Compared with the Baseline, the complete HMP configuration improves MOTA by 0.2, IDF1 by 0.6, and HOTA by 0.7, while reducing IDSW by 14 under Deep OC-SORT. Together with the BoT-SORT-ReID results, these cross-framework experiments show that the benefit of HMP is not tied to a particular tracker implementation. Instead, the improvement comes from a transferable memory-organization principle: B1 controls memory contamination, A1 improves stable long-term identity modeling, and A2 provides conservative short-term recovery for difficult residual cases.
4.4.3. Stage-Policy Ablation
The stepwise component ablation above verifies the progressive contribution of reliability-controlled writing, multi-prototype long memory, and short-queue-based residual recovery. However, it does not fully answer whether the conservative stage policy itself is necessary once these memory components are already enabled. Therefore, we conduct an additional stage-policy ablation under BoT-SORT-ReID on the MOT17 validation split. All variants in this comparison use the same memory components as the full HMP configuration, namely reliability-controlled writing, multi-prototype long memory, and the short queue; only the policy that controls how Stage 2 evidence is allowed to affect matching is changed.
Specifically, the Without Stage 1 freezing variant removes the rule that Stage 1 matches are finalized before residual recovery, allowing short-term evidence to compete with or modify associations that would otherwise have been fixed by long-memory primary matching. The Without advantage margin variant removes Equation (
20), so Stage 2 recovery no longer requires the short-queue distance to be clearly better than the corresponding long-memory distance. The Without ambiguity gap variant removes Equation (
22), making Stage 2 less selective when multiple residual candidates have similar short-queue distances. This ablation therefore evaluates whether HMP benefits simply from adding a second matching pass, or from constraining that pass with a conservative trust hierarchy.
As shown in
Table 6, removing the frozen Stage 1 policy increases IDSW from 121 to 134 and reduces IDF1 from 82.7 to 82.2. This confirms that the frozen design is not merely an implementation detail: it prevents short-term transitional evidence from overturning high-confidence matches already established by stable long-memory anchors. In other words, Stage 2 is useful as a residual-recovery mechanism, but it becomes risky when it is allowed to interfere with the primary association result. Removing the advantage-margin condition also increases IDSW from 121 to 131 while reducing MOTA from 78.5 to 78.4 relative to the full HMP setting. This indicates that a more permissive recovery policy does not bring a better overall trade-off, and instead weakens identity preservation by accepting riskier residual matches. Similarly, removing the ambiguity-gap test increases IDSW to 128, showing that Stage 2 should reject residual cases in which the best and second-best short-term candidates are not sufficiently separated. Overall, these results support the intended trust hierarchy of HMP: stable long-memory evidence should dominate primary matching, while short-term evidence should only supplement unresolved residual cases when it provides a clear and discriminative advantage.
4.4.4. Pure Module-Gain Summary Under Fixed Pipelines
To further separate module-level attribution from whole-system benchmark positioning, we summarize the pure memory-module gain under the two controlled host pipelines. In this comparison, the baseline and the full HMP variant share the same detector, ReID extractor, motion prediction/gating, assignment primitive, and track life-cycle rules; the difference is restricted to the track-level memory representation, writing policy, and stage-specific evidence usage introduced by HMP. Therefore,
Table 7 should be read as the most direct experimental evidence for the effect of the proposed memory redesign.
As shown in
Table 7, the full HMP configuration improves MOTA, IDF1, and HOTA under both host trackers while reducing IDSW. The absolute magnitude of the gain differs between the two frameworks because their baseline association behavior and identity-error profiles are different. However, the direction of improvement is consistent: replacing the original memory mechanism with HMP produces a clearer identity-continuity gain than a detection-oriented gain. This pattern directly supports the intended claim of the paper: the proposed module improves the track-memory lifecycle under fixed upstream conditions, rather than relying on a different detector, a stronger ReID extractor, or a separate end-to-end pipeline.
4.4.5. Cross-Dataset Generalization on DanceTrack
To further evaluate cross-dataset generalization beyond MOT17 and MOT20, we additionally evaluate HMP on the DanceTrack validation set [
35] under the Deep OC-SORT framework. DanceTrack is complementary to MOT17/MOT20 because it contains targets with relatively uniform appearance and diverse motion patterns, making identity association less dependent on strong appearance discrimination and more sensitive to motion consistency and temporal evidence usage. Therefore, this experiment provides a useful stress test for HMP under weak appearance discrimination and complex motion. It is not intended as a new official leaderboard comparison, but as a controlled cross-dataset validation. The host tracker, evaluation protocol, and upstream configuration are kept consistent between the baseline and HMP variant, and no dataset-specific retuning of HMP parameters is performed.
As shown in
Table 8, adding HMP to Deep OC-SORT improves MOTA from 88.5 to 88.9, HOTA from 58.51 to 58.70, and IDF1 from 59.03 to 59.43, while reducing IDSW from 1587 to 1543. The reduction of 44 identity switches corresponds to a 2.77% decrease relative to the baseline. Although the absolute improvement is moderate, the trend is consistent with the results on MOT17 and MOT20: HMP mainly improves identity-oriented behavior while keeping overall tracking quality stable.
This result provides additional cross-dataset evidence that the proposed memory mechanism is not restricted to a single benchmark. At the same time, we interpret this experiment cautiously. It validates HMP under one additional dataset and one host tracker, but it does not fully cover all possible domain shifts such as driving scenarios, low-resolution surveillance, or cross-modal tracking. Broader validation on more datasets remains an important direction for future work.
4.4.6. Parameter Sensitivity
We next examine how the number of long-term prototypes M and the short-queue length S affect identity-oriented performance on MOT17 under the BoT-SORT-ReID framework. Rather than treating these curves as single-metric tuning results, we interpret them through IDF1 and IDSW jointly, because the purpose of HMP is to improve identity continuity rather than to optimize one scalar score in isolation. The goal of this subsection is therefore to determine whether the gain of HMP comes from a compact, well-structured memory design or merely from increasing memory capacity. Each operating point requires rerunning the controlled tracker under a different memory configuration, so the curves are reported as validation trends for identifying a stable working region rather than as claims of statistical significance between neighboring points.
As shown in
Figure 4, the effect of
M is most informative when IDF1 and IDSW are read together. Increasing
M from 1 to 3 raises IDF1 from 82.1 to 82.5 while simultaneously reducing IDSW from 129 to 125, indicating that a small set of long-term prototypes already provides sufficient diversity to capture recurring appearance modes of the same target. This is the most favorable operating region because the improvement is supported by both identity-quality indicators: association quality improves while switch errors decrease. When
M is increased beyond this range, however, the long-memory bank becomes overly fragmented. Updates are dispersed across too many modes, each prototype receives less stable reinforcement, and the long-term identity anchors become less reliable. Accordingly, IDF1 declines and IDSW rises again. The figure therefore supports a more specific conclusion than “more memory helps”: the value of multi-prototype long memory lies in maintaining a compact and reliable set of identity anchors that improves identity matching while suppressing harmful switches.
As shown in
Figure 5, the role of
S is likewise clearer when both IDF1 and IDSW are considered together. Starting from very small values, increasing
S improves IDF1 and reduces IDSW, which suggests that a short queue is useful for covering brief occlusions, rapid pose changes, and other local appearance transitions that should not be written directly into long-term memory. The best trade-off appears at
, where IDF1 reaches one of its highest observed levels and IDSW attains the minimum observed value. Although
maintains a similar IDF1, its higher IDSW suggests that an overly long queue may introduce stale or noisy transitional evidence. Once
S becomes too large, however, the queue begins to retain more stale or noisy transitional evidence, which weakens its discriminative value for Stage 2 residual recovery. In that regime, the identity gain saturates and switch errors begin to rise again. This trend is consistent with the intended role of the short queue: it should operate as a compact buffer for recent transitions, rather than gradually turning into another unconstrained long-term memory.
Taken together,
Figure 4 and
Figure 5 indicate that the main gain of HMP does not depend on large memory capacity. A moderate configuration already captures most of the identity benefit, which is why we use
and
as the default settings. This operating point offers a favorable balance among IDF1, IDSW, and practical cost.
We next evaluate whether HMP depends strongly on a narrowly tuned reliability-weight setting. The default setting assigns the largest weight to appearance consistency, a secondary weight to motion consistency, and a smaller weight to detection quality. We compare it with balanced, appearance-heavy, motion-heavy, and detection-heavy variants while keeping all other parameters unchanged.
Table 9 shows that the default setting provides the best IDF1/IDSW trade-off among the tested configurations, but the improvement trend does not collapse under moderate weight perturbations. Appearance-heavy weighting remains close to the default setting, which is reasonable because HMP mainly targets track-level appearance memory. Balanced and motion-heavy variants still improve identity continuity compared with the baseline, but their IDSW values are higher than the default. The detection-heavy setting attains comparable MOTA but weakens IDF1, HOTA, and IDSW, indicating that detector confidence alone is insufficient for reliable memory writing. Overall, the results suggest that HMP benefits from appearance-dominant reliability fusion but is not overly dependent on a single fragile weight setting.
We also examine the sensitivity of HMP to the two main memory-admission thresholds, and . These thresholds control the purity-adaptability trade-off: lower thresholds admit more observations but increase contamination risk, whereas higher thresholds preserve memory purity but may suppress useful adaptation.
As shown in
Table 10, HMP remains effective within a moderate range of admission thresholds. Loose writing attains comparable MOTA, but it increases IDSW because some less reliable observations are allowed to affect long-term identity anchors or short-term recovery. Strict writing reduces the risk of contamination but weakens adaptability, leading to lower IDF1 and slightly worse overall metrics. The default configuration achieves the strongest identity-oriented trade-off, but neighboring settings remain close, indicating that the method is not dependent on a narrowly tuned threshold pair.
Taken together, the parameter analyses support three observations. First, HMP prefers an appearance-dominant reliability fusion because the proposed module operates at the appearance-memory level. Second, the method remains stable under moderate perturbations of reliability weights and admission thresholds, suggesting that the observed gains are not caused by a fragile tuning point. Third, increasing memory capacity alone is insufficient; compact memory with role separation and reliability-controlled writing is more important than simply storing more features. Although the sensitivity sweeps are conducted on the MOT17 validation split to avoid repeated official-server submissions, the same default HMP parameter set is used for the MOT17/MOT20 benchmark submissions and for the additional DanceTrack-val check. The competitive identity-oriented behavior across these settings therefore provides practical evidence that the default setting is not narrowly dataset-specific.
4.4.7. Runtime and Memory-Footprint Overhead
To evaluate the practical deployment cost of HMP, we report the observed inference speed under matched detector, ReID extractor, input, and hardware settings, and we further analyze the deterministic memory footprint introduced by the HMP state itself. For each framework, the baseline and +HMP variants are timed under the same code path. The reported frames per second (FPS) reflects end-to-end tracking throughput rather than isolated HMP module timing.
All runtime measurements in this subsection were obtained on a workstation equipped with an NVIDIA GeForce RTX 3090 graphics processing unit (GPU) (NVIDIA Corporation, Santa Clara, CA, USA) and an Intel Core i7-10700 central processing unit (CPU) (Intel Corporation, Santa Clara, CA, USA). The reported values are intended for controlled within-framework comparison under our implementation and hardware setting, rather than for direct cross-paper speed comparison, because FPS values are sensitive to the detector/ReID configuration, input resolution, precomputed inputs, hardware platform, and timing protocol.
As shown in
Table 11, HMP introduces a moderate but predictable runtime overhead under the matched GPU-workstation setting. Under BoT-SORT-ReID, the end-to-end speed decreases from 7.1 FPS to 6.4 FPS after inserting HMP, corresponding to a relative slowdown of 9.9%. Under Deep OC-SORT, the speed decreases from 22.2 FPS to 19.3 FPS, corresponding to a relative slowdown of 13.1%. These FPS values include detection, ReID extraction, motion prediction, association, and memory maintenance, rather than isolated timing of the HMP module alone.
In addition to runtime, we analyze the incremental memory footprint introduced by HMP. The module does not add an extra neural network, learnable temporal-attention block, or large external feature bank. Its persistent state mainly consists of long-memory prototypes and short-queue entries maintained for active tracks. Let
K denote the number of active tracks and
d denote the ReID feature dimension. In our implementation, each stored feature element uses single-precision storage, i.e., 4 bytes per dimension. The additional persistent feature-state memory is approximately
excluding a small number of scalar prototype statistics for support, recent access frequency, and inactivity. With the default setting
and
, this becomes
bytes. For example, when
, the additional persistent memory is about 72 KiB per active track, or about 7.0 MiB for 100 active tracks. Therefore, the per-track HMP-specific persistent state is bounded by the fixed memory capacities
M and
S, and the total feature-state memory grows linearly with the number of active tracks and the feature dimension.
The lower absolute FPS of BoT-SORT-ReID mainly reflects its heavier detector–ReID stack, not an exceptional HMP penalty. In contrast, Deep OC-SORT has a lighter end-to-end execution profile under the same hardware setting and therefore achieves higher absolute FPS. In both frameworks, the relative slowdown remains bounded, suggesting that the additional cost introduced by HMP mainly comes from feature-to-memory distance computation and lightweight prototype/queue maintenance rather than from a major change to the upstream tracking pipeline.
Peak GPU-memory consumption of the full tracker is strongly affected by the detector/ReID backbone, CUDA memory caching, input resolution, batching strategy, and implementation backend. Therefore, a single peak-memory number from one workstation would not provide a general deployment claim. We instead report the deterministic incremental HMP state memory in Equation (25), while leaving broader peak GPU-memory profiling on different accelerators to future hardware-aware evaluation.
For practical real-time edge deployment, the heavy detector–ReID configuration used in the workstation benchmark should not be directly transferred to low-power devices. An edge-oriented implementation should pair HMP with lightweight detectors and lightweight ReID extractors, such as compact detection backbones, reduced input resolution, model quantization, pruning, or accelerated inference backends. In such a setting, the end-to-end FPS is expected to depend primarily on the upstream detector–ReID stack, while the additional cost of HMP remains bounded by the number of gated candidate associations and the small memory sizes M and S.
Overall, HMP does not dominate the additional runtime cost or the HMP-specific persistent-state memory footprint under the evaluated GPU-workstation setting. Its additional cost is consistent with the linear-complexity analysis in
Section 3.6. Nevertheless, these measurements should not be interpreted as a complete deployment benchmark for all hardware platforms. A systematic evaluation with lightweight detectors, lightweight ReID models, embedded accelerators, and peak GPU-memory profiling remains necessary before making stronger claims about real-time edge deployment.