CharSPBench: An Interaction-Aware Micro-Architecture Characterization Framework for Smartphone Benchmarks

Ouyang, Chenghao; Yang, Zhong; Li, Guohui

doi:10.3390/electronics15020432

Open AccessArticle

CharSPBench: An Interaction-Aware Micro-Architecture Characterization Framework for Smartphone Benchmarks

by

Chenghao Ouyang

¹

,

Zhong Yang

²

and

Guohui Li

^1,*

¹

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

²

School of Software Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(2), 432; https://doi.org/10.3390/electronics15020432

Submission received: 22 December 2025 / Revised: 11 January 2026 / Accepted: 15 January 2026 / Published: 19 January 2026

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

Mobile application workloads are inherently driven by user interactions and are characterized by short execution phases and frequent behavioral changes. These properties make it difficult for traditional micro-architecture analysis approaches, which typically assume stable execution behavior, to accurately capture performance bottlenecks in realistic mobile scenarios. To address this challenge, this paper presents CharSPBench, an interaction-aware micro-architecture characterization framework for analyzing mobile benchmarks under representative user interaction scenarios. CharSPBench organizes micro-architecture performance events in a structured and semantically consistent manner. It further enables systematic attribution of performance bottlenecks across different interaction conditions. The framework further supports intensity-based workload analysis to identify workload tendencies, such as memory-intensive and frontend-bound behaviors, under interaction-driven execution. Using the proposed framework, 126 micro-architecture performance events are systematically organized. This process leads to the identification of 19 key, semantically non-redundant features, further grouped into five major micro-architecture subsystems. Based on this structured representation, eight representative interaction-dependent micro-architecture insights are extracted to characterize performance behavior across mobile benchmarks. These quantitative results demonstrate that CharSPBench complements existing micro-architecture analysis techniques and provides practical support for interaction-aware benchmark design and mobile processor performance evaluation.

Keywords:

mobile benchmarks; user interaction; micro-architecture characterization; performance events

1. Introduction

Mobile application workloads are inherently driven by user interactions [1,2,3]. Unlike traditional task-oriented workloads, which exhibit relatively stable execution phases, interaction-driven mobile workloads behave very differently. They are characterized by short execution intervals, diverse execution paths, and frequent behavior transitions triggered by user input events [4,5,6]. Such interactions typically involve coordinated activities across the software stack, including user interface rendering, system service invocation, and thread scheduling [7]. As a result, processor behavior under interactive execution is jointly shaped by multiple micro-architecture subsystems, including instruction fetch, cache hierarchies, branch prediction, and execution pipelines. This tight coupling between software dynamics and hardware execution poses a fundamental challenge for understanding micro-architecture performance bottlenecks in interaction-driven mobile workloads.

Despite the importance of interaction-driven behavior, achieving reliable micro-architecture characterization under realistic user interactions remains challenging. Interaction driven execution exhibits short and highly dynamic execution phases, which complicate micro-architecture-level analysis. Mobile performance evaluation commonly focuses on end-to-end performance comparison or user-perceived metrics, rather than fine-grained micro-architecture analysis [1]. As a result, accurate extraction and interpretation of micro-architecture bottlenecks from realistic mobile workloads remain difficult in practice. This challenge highlights the need for an interaction-aware characterization approach that can systematically interpret micro-architecture behavior under realistic mobile workloads.

To enable micro-architecture analysis under realistic user interactions, this study builds upon SPBench, a mobile benchmark suite that explicitly incorporates representative interaction scenarios [1,8,9]. SPBench replays representative interaction patterns, such as sliding, switching, and quenching, across a large collection of Android applications. It collects micro-architecture performance counters on commercial mobile processors with low overhead, enabling systematic observation of processor behavior under realistic usage conditions. However, while such interaction-aware benchmarks expose rich micro-architecture performance data, the impact of different interaction modes on micro-architecture behavior has not yet been systematically analyzed. Without a dedicated interaction-aware analysis methodology, interpreting these performance variations remains challenging.

Our preliminary analysis using SPBench reveals that the same application can exhibit markedly different micro-architecture behaviors under different interaction scenarios. Metrics related to cache misses, branch mispredictions, and address translation overhead vary significantly between sliding, switching, and quenching interactions. Further investigation suggests that operating system scheduling effects, memory access patterns, and thread context switching play a central role in shaping these differences [10,11,12,13]. For example, sliding interactions often preserve cache locality across successive rendering frames, whereas switching and quenching interactions tend to introduce frequent context switches, leading to cache invalidations and pipeline disruptions. These observations indicate that interaction-specific execution characteristics fundamentally influence micro-architecture behavior and motivate the need for a systematic approach to attribute performance variations to user interactions.

The primary objective of this study is to propose and validate CharSPBench, an interaction-aware micro-architecture characterization framework for interaction-driven smartphone workloads. To address the above challenges, this paper proposes CharSPBench, an interpretable micro-architecture bottleneck characterization approach for interaction-driven mobile workloads. CharSPBench is grounded in micro-architecture performance events that can be stably observed under realistic user interaction scenarios, ensuring applicability in practical mobile environments. By organizing these events through structured feature modeling and analysis, the proposed approach enables low-overhead characterization and attribution of micro-architecture behavior for different benchmarks across multiple interaction scenarios. By explicitly accounting for the impact of user interactions on program execution, CharSPBench provides an interpretable framework for understanding micro-architecture performance behavior in interaction-driven mobile systems.

Beyond generic performance evaluation, CharSPBench is designed to bridge micro-architecture characterization and user interaction–driven system behavior in smartphone workloads. For hardware architects, the framework provides interaction-aware, subsystem-level insights that reveal how different user behaviors induce distinct micro-architecture pressure patterns under realistic usage scenarios. For OS and system developers, CharSPBench links interaction-triggered execution segments with interpretable micro-architecture behavior, offering a hardware-grounded perspective on how scheduling and execution state transitions influence performance. By connecting micro-architecture analysis with interaction contexts, CharSPBench provides a common analytical basis for cross-layer performance understanding.

The main contributions and innovations of this paper are summarized as follows:

CharSPBench is proposed as an interpretable micro-architecture bottleneck characterization framework for interaction-driven mobile workloads, addressing the limited applicability of existing analysis methods under realistic user interaction scenarios.
An Intensity-Level Characterization (ILC) method is introduced to enable intensity-aware workload characterization across different benchmarks and interaction scenarios, facilitating the identification of dominant execution tendencies such as memory-intensive and frontend-bound behavior.
A systematic micro-architecture analysis is conducted on multiple commercial mobile processor platforms under representative interaction scenarios, including sliding, switching, and quenching, from which eight representative micro-architecture performance insights are distilled.

The remainder of this paper is organized as follows. Section 2 discusses related work. Section 3 introduces the background and motivation. Section 4 presents the proposed CharSPBench methodology. Section 5 describes the experimental setup. Section 6 reports the experimental results and insights. Section 7 concludes the paper.

2. Related Work

Micro-architecture performance analysis and workload characterization have been widely studied for bottleneck identification and performance understanding. Prior work mainly differs in analysis granularity, modeling objectives, and platform assumptions. In interaction-driven mobile applications, user actions trigger short-lived execution segments whose micro-architectural behaviors can vary rapidly, challenging approaches that assume stable or coarse-grained execution behavior. Existing analysis techniques therefore face inherent trade-offs when applied to interactive settings. Lightweight metrics such as IPC or individual MPKI values are easy to deploy but offer limited interpretability [14]. Instrumentation- or simulation-based approaches, on the other hand, provide fine-grained insights at the cost of high overhead and limited scalability under realistic interaction-driven scenarios [15]. These studies are discussed to illustrate general methodological trade-offs and are not included as direct comparison targets, while Table 1 compares representative approaches using the analysis dimensions adopted in this work.

Architectural bottleneck analysis. Several studies focus on micro-architecture bottleneck attribution with strong interpretability. Weingarten et al. proposed a CPU-oriented top-down microarchitectural analysis framework for bottleneck attribution [14]. Jang et al. introduced RPStacks-MT for multicore bottleneck analysis using stacked performance models [16]. Bai et al. integrated bottleneck reasoning into micro-architecture design space exploration through BOOM-Explorer and ArchExplorer [17,18]. These approaches are effective under stable execution contexts but typically operate at application or phase granularity, which limits their ability to capture interaction-induced transient behaviors in mobile workloads.

Workload characterization and behavior modeling. Another line of work emphasizes workload behavior characterization and feature-based modeling. Criswell and Adegbija surveyed phase classification techniques based on performance events [19]. Wang et al. characterized job-level micro-architecture behaviors using large-scale traces [20], and Schall et al. analyzed short-lived workloads in serverless environments [21]. While these studies capture behavior diversity across phases or jobs, their temporal abstractions are misaligned with interaction-triggered execution segments. As a result, interaction-induced variations are often averaged out rather than explicitly modeled.

Platform scope and emerging workloads. Recent work has also examined micro-architecture characteristics of emerging workloads, such as AI applications on GPU or server platforms. However, these approaches typically target long-running, accelerator-centric workloads and do not consider mobile-specific constraints or interaction-driven execution dynamics, limiting their applicability to mobile user-facing performance analysis.

Positioning of CharSPBench. As summarized in Table 1, existing approaches rarely treat interaction as a first-class modeling objective for mobile micro-architecture analysis. CharSPBench addresses this gap by combining systematic feature selection, architectural interpretability, and fine-grained behavior characterization under explicit interaction modeling. Recent studies have also explored scheduling and system-level performance optimization in mobile and embedded environments; however, they remain largely task- or system-centric and do not address interaction-driven micro-architecture characterization [22,23,24,25,26]. By complementing prior work with interaction-triggered execution segments as the analysis unit, CharSPBench enables interpretable characterization of micro-architecture behavior in interaction-driven mobile applications. CharSPBench does not aim to replace existing micro-architecture analysis methodologies. Instead, it complements prior approaches, including top-down bottleneck analysis, by providing interaction-aware and fine-grained execution characterization for higher-level performance analysis.

Table 1. Capability comparison between CharSPBench and representative workload characterization and micro-architecture analysis approaches.

	Feature Sel.	Arch. Interp.	Fine-Grained Char.	Mobile-Aware	Interaction-Aware
Weingarten [14]	×	✓	×	×	×
Jang [16]	✓	✓	×	×	×
Criswell [19]	✓	×	✓	×	×
Li [27]	✓	×	×	×	×
Bai [17]	✓	✓	×	×	×
Wang [20]	✓	×	×	×	×
Bai [18]	✓	✓	×	×	×
Schall [21]	✓	✓	✓	×	×
CharSPBench	✓	✓	✓	✓	✓

Feature Sel. denotes feature selection, Arch. Interp. denotes micro-architecture interpretability, Fine-grained Char. denotes fine-grained behavior characterization, Mobile-aware denotes mobile-oriented design considerations, and Interaction-aware denotes explicit interaction behavior modeling. Symbols ✓ and × indicate whether the corresponding aspect is a core design objective or not.

3. Background and Motivation

3.1. SPBench Overview

SPBench is an interaction-driven mobile benchmark infrastructure designed to support micro-architecture performance analysis under realistic user behavior [1]. Unlike conventional mobile CPU benchmark suites, which are typically designed to report end-to-end performance scores or user-perceived metrics [28], SPBench focuses on exposing micro-architecture behavior during interactive execution. It is constructed as a compact yet representative benchmark set derived from a large corpus of real-world Android applications. Specifically, SPBench consists of 15 benchmark workloads selected from 100 popular Android applications, each exercised under three representative interaction scenarios: sliding, switching, and quenching. These 15 benchmarks are selected from the 100 most popular mobile applications ranked on Google Play and the Apple App Store. Together, they form a benchmark suite designed to represent the micro-architecture behavior of the full application set across three interaction scenarios on four representative mobile platforms. Each benchmark in SPBench therefore corresponds to an application–interaction pair, enabling systematic observation of processor behavior under execution patterns that closely resemble everyday mobile usage.

As shown in Figure 1, many existing mobile CPU benchmark suites employ only a limited number (0–13) of PMU events, which constrains their ability to expose micro-architecture behavior under realistic execution conditions [29,30,31,32]. In contrast, SPBench provides substantially broader PMU event coverage by incorporating 126 micro-architecture performance events, making it more suitable for micro-architecture-oriented analysis. A key characteristic of SPBench is its emphasis on micro-architecture performance events as primary analysis signals. Mainstream mobile CPU benchmark suites differ significantly in the number of micro-architecture events they utilize. Many benchmarks rely on only a small subset of events, and some do not employ micro-architecture events at all. This indicates that most existing benchmarks are not designed with micro-architecture characterization as a central objective. As a result, their ability to support fine-grained bottleneck analysis under interaction-driven workloads is inherently limited. The broader event coverage provided by SPBench, based on 126 PMU events, enables detailed analysis of cache behavior, branch execution, address translation, and pipeline activity across different interaction scenarios, making it a suitable foundation for the interaction-aware micro-architecture characterization and analysis presented in this work.

3.2. Algorithms Used in This Study

This section introduces several lightweight algorithms that are used in the CharSPBench framework to support structured analysis of micro-architecture performance events.

3.2.1. Stochastic Gradient Boosting Regression Trees

Stochastic Gradient Boosting Regression Trees (SGBRT) is an ensemble learning method that combines regression trees with gradient boosting to model complex, non-linear relationships between input features and a target variable [33,34]. Instead of constructing a single predictive model, SGBRT incrementally builds an additive ensemble by fitting successive trees to the residuals of previous models, thereby improving model expressiveness through stage-wise optimization. SGBRT is adopted in this study due to its ability to efficiently model non-linear relationships among micro-architecture events while maintaining robustness to the noise and execution variability inherent in interaction-driven PMU measurements.

Formally, given a dataset

{(x_{i}, y_{i})}_{i = 1}^{N}

, where

x_{i} = (x_{i, 1}, x_{i, 2}, \dots, x_{i, d})

denotes a feature vector and

y_{i}

represents the corresponding response variable, the boosted model can be expressed as

{\hat{y}}_{i} = \sum_{m = 1}^{M} f_{m} (x_{i}),

(1)

where each

f_{m} (\cdot)

denotes a regression tree learned sequentially to minimize a predefined loss function.

An important property of SGBRT is its ability to quantify feature importance as a byproduct of model construction. Feature importance is commonly estimated by aggregating the contribution of each feature to loss reduction across all tree splits in the ensemble. Features that frequently participate in splits and lead to larger loss reductions are considered more influential in explaining the variation of the target variable.

3.2.2. Z-Score Normalization

Z-score normalization is a standard statistical technique used to transform numerical features onto a common scale [35]. It normalizes a feature by centering it around its mean and scaling it by its standard deviation, thereby eliminating differences caused by feature magnitude and units.

For a feature value x with mean

μ

and standard deviation

σ

, the normalized value z is defined as

z = \frac{x - μ}{σ} .

(2)

After transformation, the resulting feature distribution has zero mean and unit variance.

Z-score normalization is commonly applied when features exhibit heterogeneous ranges or variability. By preserving relative deviations while removing scale effects, it enables subsequent analysis to focus on intrinsic relationships rather than absolute values. This property makes Z-score normalization a widely adopted preprocessing step in performance analysis and statistical modeling.

4. CharSPBench Methodology

CharSPBench is a methodology for characterizing micro-architecture behavior of interaction-driven mobile workloads using SPBench as the benchmark substrate. While SPBench provides interaction-driven benchmark workloads and PMU data collection, CharSPBench focuses on the systematic characterization and interpretation of micro-architecture behaviors under such interactions. The key challenge is that interactive execution is short-phased and non-stationary, and performance events often change together across the pipeline and memory hierarchy. As a result, bottleneck interpretation based on isolated metrics or ad-hoc thresholds becomes unstable across interactions and platforms.

CharSPBench addresses this challenge by organizing PMU events into a structured analysis workflow. It first constructs a compact and interpretable event representation, and then performs intensity-level characterization to expose dominant execution tendencies (e.g., branch control and speculative execution) under each interaction scenario. Finally, it supports interaction-centric aggregation to compare event patterns across sliding, switching, and quenching, enabling consistent extraction of interaction-dependent micro-architecture insights. This methodology builds on lightweight learning and normalization primitives introduced in Section 3.2, while keeping the overall analysis applicable to commodity mobile SoCs. It is worth noting that the computational analysis in CharSPBench is lightweight and can be completed within minutes once the profiling data are available, while the time-consuming part of the workflow lies primarily in PMU data collection.

Figure 2 presents the overall workflow of CharSPBench. Interaction-driven executions in SPBench (e.g., sliding, switching, and quenching) are first monitored to collect micro-architecture PMU events. These events are then organized through structured event analysis, including important event identification using SGBRT and redundancy reduction with semantic grouping across major micro-architecture components (cache hierarchy, TLB, branch control, speculative execution, and memory interconnect). Based on the structured event representation, CharSPBench performs MIA-based visual analysis and interaction-driven insight analysis, and finally applies an Intensity-aware Load Characterization (ILC) framework to systematically characterize workload intensity and expose micro-architecture bottlenecks under different interaction scenarios.

Interaction-driven Event Observation: CharSPBench starts from interaction-driven event observation in order to capture activities at the micro-architecture level that are directly triggered by user operations. In this work, such activities are observed through hardware performance monitoring units, which expose micro-architecture events related to instruction execution, memory access, and control flow. In contrast to conventional workloads that assume relatively stable and long execution phases, mobile applications driven by user interaction typically execute in short bursts and frequently change their execution paths. Operations such as sliding, switching, and quenching activate different code regions associated with interface updates, page transitions, and system state changes. By monitoring PMU events during these interaction-driven executions, CharSPBench collects event traces that reflect the transient and non-stationary characteristics of real mobile usage.

PMU event collection is conducted separately under each interaction script, and no switching occurs across different interaction modes during a profiling run. For each benchmark, a single interaction type (i.e., sliding, switching, or quenching) is executed independently, and PMU events are collected only during the execution of that interaction. The next interaction is initiated only after the previous one has completed. Interaction behaviors are driven by scripted execution and are organized into fixed 10-s behavior simulation windows. PMU collection is synchronously performed for the same 10-s interval using the OCOE (One Counter One Event) mechanism. Each profiling run therefore captures a steady interaction-driven execution phase corresponding to a single interaction mode, without including pre-interaction, post-interaction, or cross-interaction transition states. Each benchmark–interaction pair is executed three times, and the reported PMU values are obtained by averaging across the repeated runs.

Structured Event Organization: While raw PMU events provide direct visibility into micro-architectural behavior, analyzing individual event values in isolation is often ineffective. This is due to differences in event definitions, correlated behaviors across subsystems, and variations in hardware implementations. To improve the interpretability and consistency of analysis, CharSPBench applies a structured organization to the collected events. Events that are closely related to performance variation are first identified, after which redundant information is reduced and remaining events are grouped according to their architectural semantics. Through this process, micro-architecture events are organized around major processor subsystems, including the cache hierarchy, TLB, branch control, speculative execution, and memory interconnect. The resulting representation converts a collection of loosely related counters into a coherent and stable event space that supports subsequent analysis.

Interaction-aware Bottleneck Characterization: Based on the structured event representation, CharSPBench proceeds to characterize performance bottlenecks at the micro-architecture level under different interaction scenarios. During interactive execution, pressure on processor subsystems changes rapidly as execution paths evolve in response to user input, which makes bottlenecks highly dependent on interaction context. CharSPBench addresses this challenge by combining visual analysis with interaction-driven insight analysis and by introducing an Intensity-aware Load Characterization framework to summarize execution intensity across subsystems. This interaction-aware characterization enables consistent comparison of behavior at the micro-architecture level across sliding, switching, and quenching scenarios and supports systematic identification of interaction-dependent bottlenecks.

4.1. Interaction-Driven Micro-Architecture Event Modeling and Structuring

To support micro-architecture bottleneck characterization under interaction-driven workloads, CharSPBench constructs a structured event representation from observed micro-architecture events. These events are collected through hardware performance monitoring mechanisms and capture fine-grained activities within the processor micro-architecture. While interaction-driven execution exposes rich performance behaviors, directly analyzing raw micro-architecture events is often ineffective due to the large event space, strong inter-event correlations, and vendor-specific event definitions. This subsection therefore focuses on how the observed event space is systematically modeled, filtered, and structured to obtain a stable and interpretable feature representation for subsequent analysis.

At the event modeling stage, CharSPBench employs a model-driven importance analysis to identify representative micro-architecture events. Given a set of observed micro-architecture events and a corresponding performance metric, a Stochastic Gradient Boosting Regression Tree (SGBRT) model is trained to quantify the contribution of each event to performance variation. Compared with linear models, SGBRT captures non-linear relationships between events and performance without imposing restrictive assumptions, making it suitable for heterogeneous mobile platforms. Unless otherwise specified, SGBRT is used with the default parameter settings provided by the standard implementation, and no parameter tuning is performed.

Let the complete micro-architecture event set be denoted as

E = {e_{1}, e_{2}, \dots, e_{M}},

(3)

where M is the number of observable micro-architecture events. Based on the trained SGBRT model, each event

e_{m}

is assigned an importance score

I_{m}

, reflecting its contribution to variations in the target performance metric. Events are then ranked according to their importance scores, and the top-ranked events are retained to form a reduced and representative event set:

E_{K} \subseteq E, | E_{K} | ≪ | E | .

(4)

This importance-based filtering step removes weakly relevant or redundant events and provides a compact input for subsequent semantic analysis. Importantly, the derived set

E_{K}

serves as the sole input to the following structuring process.

After obtaining the important event set, CharSPBench introduces semantic constraints to organize micro-architecture events in a structured manner. Specifically, events in

E_{K}

are grouped according to their associated processor subsystems. Let the semantic partition of the important event set be defined as

F = {F_{1}, F_{2}, \dots, F_{G}}, F_{i} \cap F_{j} = \emptyset (i \neq j),

(5)

where G denotes the number of micro-architecture subsystems. In this work,

G = 5

, corresponding to cache hierarchy, TLB, branch control, speculative execution, and memory interconnect. This semantic partition maps importance-filtered micro-architecture events onto subsystem-oriented views with clear hardware boundaries, enabling consistent interpretation across platforms.

Within each subsystem group, events may still exhibit redundancy due to overlapping semantics. CharSPBench therefore performs redundancy reduction within each semantic group. Specifically, for subsystem g, a reduced (de-redundant) representative set denoted by

F_{g}^{'}

is derived, where the prime

{(\cdot)}^{'}

indicates the set after redundancy reduction:

F_{g}^{'} \subseteq F_{g}, | F_{g}^{'} | ≪ | F_{g} | .

(6)

In practice, two events within the same subsystem are considered redundant when they reflect overlapping execution semantics and characterize the same micro-architectural behavior. When redundancy is identified, CharSPBench retains the event with the higher importance score and removes the other one.

Through the above importance-based filtering and semantic structuring process, the original micro-architecture event space is transformed into a structured feature representation with explicit subsystem affiliation, consistent semantics, and cross-platform comparability. This structured event representation provides a unified foundation for subsequent interaction-aware analysis, including MIA-based visual insight extraction and intensity-aware workload characterization.

4.2. MIA-Based Analysis Procedure for Mobile Micro-Architecture

Based on the structured micro-architecture event representation constructed in Section 4.1, CharSPBench applies Metric Importance Analysis (MIA) as an analysis procedure to examine interaction-dependent sensitivity patterns on mobile processors. MIA was originally proposed for server and data-center systems to quantify the relative contribution of performance metrics [36]. In this work, it is adopted as an analysis tool and applied to interaction-driven mobile micro-architecture behavior.

Mobile workloads exhibit short execution phases and frequent behavior shifts triggered by user interactions, leading to interaction-dependent pressure on micro-architecture subsystems. To capture such effects, MIA is applied within each semantic subsystem defined in Section 4.1, rather than across the entire feature space. Let

F_{g}^{'}

denote the redundancy-reduced event set of subsystem g. MIA is used to analyze the relative sensitivity of events in

F_{g}^{'}

with respect to a target performance metric under different interaction scenarios and devices.

Following the standard MIA formulation, event sensitivity is measured by aggregating the loss reduction contributed by an event across split nodes in an ensemble learning model. For a split node v, the loss reduction is defined as

Δ L (v) = L_{parent} - (L_{left} + L_{right}) .

(7)

The resulting sensitivity values are used solely for comparative analysis and visualization, rather than for further feature selection.

By applying MIA across different mobile devices and interaction types, CharSPBench reveals how the relative contribution of micro-architecture events within each subsystem varies with interaction context. These interaction-aware sensitivity patterns provide direct input to the intensity-aware load characterization framework described in Section 4.3.

4.3. Intensity-Aware Load Characterization (ILC)

This subsection introduces an Intensity-aware Load Characterization (ILC) framework for quantifying subsystem-level pressure induced by interaction-driven mobile workloads. ILC is built directly on the structured micro-architecture event representation established in Section 4.1 and serves as a higher-level abstraction that connects fine-grained event observations with interpretable workload characterization.

As described in Section 4.1, the important micro-architecture event set

E_{K}

is semantically partitioned into subsystem-specific groups

{F_{g}}_{g = 1}^{G}

and further compacted into redundancy-reduced representative sets

{F_{g}^{'}}_{g = 1}^{G}

. These sets define a stable and interpretable subsystem-oriented feature space. ILC operates exclusively on

F_{g}^{'}

and does not introduce additional event selection or restructuring steps.

Mobile application behavior is inherently sensitive to both user interactions and underlying micro-architectures. To explicitly account for these dimensions in a general form, a set of interaction operations is considered.

O = {o_{1}, o_{2}, \dots, o_{| O |}},

where each

o \in O

represents a distinct user interaction pattern, and a set of mobile platforms

A = {a_{1}, a_{2}, \dots, a_{| A |}},

where each

a \in A

denotes a representative mobile micro-architecture platform. This abstraction ensures that the ILC formulation is independent of specific devices or interaction instances.

For a benchmark b, an event

e \in E_{K}

, an architecture

a \in A

, and an interaction

o \in O

, let

x_{a, o} (b, e)

denote the observed event value. To reflect the structural contribution of micro-architecture events to performance behavior, ILC leverages the importance scores derived during SGBRT-based modeling in Section 4.1. For each event

e \in E_{K}

with importance score

I_{e}

, a normalized importance weight is defined as

w_{e} = \frac{I_{e}}{\sum_{e^{'} \in E_{K}} I_{e^{'}}} .

(8)

Since raw event magnitudes are not directly comparable across interactions and platforms, ILC applies Z-score normalization to obtain a unified scale:

Z_{a, o} (b, e) = \frac{x_{a, o} (b, e) - μ_{e}}{σ_{e}},

(9)

where

μ_{e}

and

σ_{e}

are computed over all benchmarks, interactions, and architectures. Normalized values are then aggregated across interaction and architecture dimensions to characterize stable pressure tendencies:

\bar{Z} (b, e) = \frac{1}{| A | | O |} \sum_{a \in A} \sum_{o \in O} Z_{a, o} (b, e) .

(10)

Given the redundancy-reduced representative set

F_{g}^{'}

for subsystem g, the intensity of benchmark b on subsystem g is defined as

S_{b, g} = \sum_{e \in F_{g}^{'}} w_{e} \cdot \bar{Z} (b, e) .

(11)

S_{b, g}

provides a compact yet expressive measure of the pressure exerted by benchmark b on subsystem g, integrating both the magnitude of event activity and the structural relevance of events to performance behavior.

To facilitate interpretation and subsequent analysis, the dominant benchmark for each subsystem is further defined as

b_{g}^{*} = arg max_{b} S_{b, g} .

(12)

Benchmark

b_{g}^{*}

corresponds to the workload imposing the strongest aggregate pressure on subsystem g and is highlighted in later analysis as a representative case.

5. Experimental Setup

The experimental environment consisted of four representative mobile platforms that cover a wide range of contemporary mobile CPU micro-architectures. The evaluated devices include Huawei Mate 30 5G (Kirin 990 5G), Samsung Galaxy Note10 5G (Snapdragon 855 5G), Xiaomi Mi 11 Pro (Snapdragon 888 5G), and OPPO OnePlus Ace (Dimensity 8100-MAX 5G). These platforms adopt heterogeneous multi-core CPU designs, in which big, middle, and little cores differ in operating frequency, micro-architecture type, and cache hierarchy configuration. All experiments were conducted on manufacturer-customized Android systems (e.g., EMUI, MIUI, and One UI), all of which are based on Android 11.

To ensure micro-architecture-level evaluation fidelity, the cache configurations of the four platforms are further summarized. These configurations include the L1 instruction and data cache sizes, the per-core L2 cache capacities for different core types, and the shared L3 cache capacity. The detailed cache hierarchy information is reported in Table 2. Specifically, the L1 cache size varies from 32 KB to 64 KB across different core types, while the L2 cache capacity ranges from 128 KB to 1 MB depending on the micro-architecture design. In addition, the Mate 30 5G and Galaxy Note10 5G platforms are equipped with a 2 MB shared L3 cache, whereas the Mi 11 Pro and OnePlus Ace feature a larger 4 MB L3 cache. These cache hierarchy differences provide a necessary hardware basis for cross-architecture performance event analysis and subsequent investigation of feature stability across heterogeneous mobile platforms.

6. Results and Analysis

This section presents the experimental results and analysis of CharSPBench on interaction-driven mobile workloads. Based on the structured micro-architecture event representation and the proposed intensity-aware load characterization framework, distinct micro-architecture behaviors under representative user interactions are analyzed. This analysis reveals how different benchmarks exhibit heterogeneous execution characteristics across interaction scenarios. The analysis focuses on interaction-dependent performance characteristics, subsystem-level pressure patterns, and their consistency across heterogeneous mobile platforms, with the goal of exposing interpretable micro-architecture bottlenecks and workload tendencies.

6.1. Preliminary Analysis of Interaction-Driven Miss-Related Features

In this section, before introducing semantic grouping and de-redundant modeling of micro-architecture events, miss-related features are first examined from an intuitive perspective. This preliminary analysis provides an initial understanding of how micro-architecture pressure varies across different interaction scenarios. Miss behaviors associated with branch prediction, address translation, and cache hierarchy directly reflect control flow perturbations and data access mismatches during execution, and their variations are often closely correlated with interaction-induced changes in program behavior.

It should be noted that, in the subsequent intensity-aware load characterization, some miss-related features may be merged or excluded during redundancy reduction due to low importance weights or strong correlation with access- or refill-related features. This processing is intended to construct a compact and stable feature representation and does not diminish the analytical value of miss behavior itself. Accordingly, before intensity modeling, a representative set of miss-related features is selected and normalized using the PKI (per kilo instruction) metric. By comparing miss behavior distributions under sliding, switching, and quenching interactions, this preliminary analysis provides direct behavioral intuition to motivate subsequent feature modeling and intensity characterization.

Insight 1: Interaction scenarios exhibit stable differentiation in miss pressure intensity and distribution, consistently across platforms.

As shown in Figure 3, interaction scenarios exhibit a clear and stable differentiation in the proportional composition of key miss-related features. Here, M30, N10, M11, and OA denote Mate30, Note10, Mi11 Pro, and OnePlus Ace, respectively, and the full names of the miss-related feature abbreviations are provided in Table 3. Across all four platforms, sliding consistently shows the highest miss contribution, switching follows, and quenching remains the lowest. This relative ordering is preserved on Mate30, Note10, Mi11 Pro, and OnePlus Ace, indicating that miss pressure differentiation is primarily driven by interaction behavior rather than platform-specific micro-architecture details. Despite differences in absolute event counts and event observability, the interaction-driven relative pressure structure remains reproducible across platforms.

From a micro-architecture perspective, sliding interactions trigger sustained interface updates and frequent rendering activity, causing rapid evolution of execution paths and data access patterns. This behavior reduces control-flow predictability and increases pressure on the front end, address translation, and cache hierarchy simultaneously, leading to a compound elevation of branch-related, TLB-related, and cache-related misses. Switching interactions correspond to phase-level path and context transitions, resulting in moderate and more localized miss perturbations, while quenching interactions constrain execution activity and allow prediction and memory structures to converge, producing consistently low miss contributions.

Importantly, Figure 3 captures aggregate proportional relationships among multiple miss-related features rather than isolated event fluctuations. By emphasizing relative contributions, this representation mitigates platform-dependent effects and provides a robust basis for identifying interaction-driven pressure patterns. These observations motivate subsequent intensity-aware load characterization, which seeks to quantify such interaction-induced pressure in a unified and cross-platform manner.

Insight 2: Write-path related miss features are unobservable on newer platforms and exhibit stronger association with active interaction scenarios on observable platforms.

As shown in Figure 3, write-path related miss events exhibit clear platform-dependent observability. Branch-store-misses and L1-dcache-store-misses (BRSMPKI and 1DSMPKI) are only reported on Mate30 and Note10, and are absent on Mi 11 Pro and OnePlus Ace, reflecting differences in PMU event support and event semantics across micro-architectures.

On the platforms where these events are observable, both BRSMPKI and 1DSMPKI show a consistent interaction-dependent bias, with slightly higher proportional contributions under sliding and switching and lower contributions under quenching. This trend aligns with more frequent state updates and data write activity during interaction-active phases, while constrained execution in quenching reduces write-path activation. From a micro-architecture perspective, write-path related misses are closely tied to cache line state transitions and coherence-related mechanisms, making them more sensitive to interaction activity than to sustained read-dominated execution.

Due to their lack of observability on newer platforms, write-path related miss events are not suitable as unified inputs for subsequent intensity-aware modeling. Nevertheless, their consistent interaction-dependent trends on legacy platforms provide complementary insight into write-path pressure induced by active user interactions.

6.2. Semantic Grouping and Redundancy Reduction of Important Micro-Architecture Features

Based on the semantic grouping and redundancy reduction described above, a representative feature set composed of five key micro-architecture subsystems is obtained. This reduced set preserves the structural information of the original 57 micro-architecture events while substantially compressing the event space and eliminating statistical overlap among semantically similar events. As a result, each subsystem is represented by a compact set of features that captures its dominant sources of performance pressure with minimal semantic redundancy.

Using this refined feature representation, the pressure distribution of individual SPBench benchmarks across different micro-architecture subsystems can be evaluated more accurately, enabling clearer identification of dominant performance bottlenecks and cross-benchmark differences under interaction-driven execution. Table 4 summarizes the final de-redundant feature sets for the five subsystems, which serve as the foundation for subsequent intensity-based classification and micro-architecture insight analysis.

6.3. Intensity Profiling of SPBench Benchmarks

Based on the representative feature sets listed in Table 4, the pressure intensity of each SPBench benchmark is quantified across different micro-architecture feature categories using the previously defined intensity metric. The corresponding intensity-based classification results are then derived. Specifically, intensity scores are computed by aggregating standardized feature values collected from four mobile platforms (Huawei Mate 30 5G, Samsung Galaxy Note10 5G, Xiaomi Mi 11 Pro, and OPPO OnePlus Ace) under three interaction scenarios (sliding, switching, and quenching), together with the feature importance weights learned by SGBRT. This aggregation strategy captures stable pressure tendencies that persist across both interaction patterns and heterogeneous mobile SoC designs.

The resulting intensity distributions are summarized in Table 5. This table not only reports the intensity-oriented tendencies of individual benchmarks across different feature categories, but also highlights the benchmark with the highest intensity in each category, serving as the most representative high-pressure workload for the corresponding micro-architecture subsystem. These results provide a direct basis for subsequent targeted bottleneck analysis and architecture-level design insights.

Insight 3: Highly representative benchmarks tend to exhibit weak subsystem-level intensity, while strong bottleneck concentration is more often observed in less representative benchmarks.

As shown in Table 5, benchmarks with higher representativeness in SPBench, such as Tmall, Coolapk, Netease, and GoogleDrive, generally exhibit few intensity markings across the five micro-architecture subsystems. Some of these benchmarks do not show a pronounced intensity tendency in any subsystem. In contrast, benchmarks with lower representativeness, including Wiz, Messenger, PVZ, Easymoney, and Health, frequently exhibit clear intensity patterns across multiple subsystems and are often identified as the most pressure-dominant workloads within specific categories.

This observation is consistent with the construction mechanism of SPBench. Benchmark representativeness is determined by the clustering distribution of one hundred real applications under three interaction scenarios. Highly representative benchmarks cover larger application clusters and therefore reflect averaged or mainstream execution behavior. Such workloads tend to impose moderate and balanced pressure across multiple micro-architecture subsystems, rather than forming a dominant bottleneck in a single subsystem. As a result, they are less likely to be classified as strongly subsystem-intensive.

In contrast, benchmarks with lower representativeness correspond to smaller and more specialized application clusters. Their micro-architecture behavior is often dominated by pressure concentrated on one or a few key subsystems, such as the cache hierarchy, branch control, speculative execution, or the memory system. These benchmarks therefore play an important role in exposing subsystem-specific bottlenecks and providing actionable insights for micro-architecture analysis and design evaluation.

Overall, highly representative benchmarks in SPBench primarily serve to preserve coverage of real application behavior, while less representative but bottleneck-concentrated benchmarks complement them by revealing extreme pressure patterns. Together, they form a balanced benchmark suite that supports both representativeness and micro-architecture insight extraction.

Insight 4: Under sliding interactions, the Health benchmark exhibits amplified L1 refills and frontend misses, which progressively expose accesses to lower cache levels and result in increased L2/L3 refill and writeback activities; this effect is substantially weaker under switching and quenching interactions.

To analyze cache hierarchy behavior, it is necessary to focus on a compact set of representative cache-related events, since the cache subsystem exposes a large number of correlated performance indicators across multiple levels. SPBench is constructed using micro-architecture events collected from the Mate 30 5G, Galaxy Note10 5G, and Mi 11 Pro platforms through the BEMAP methodology. To evaluate whether the cache-related behaviors captured by SPBench generalize beyond the construction platforms, cache behavior is examined on an independent mobile device, OPPO OnePlus Ace (OA), which is not involved in benchmark construction. Accordingly, the following analysis uses OA as a representative example.

As shown in Figure 4, the sliding scenario exhibits the most prominent cache-pressure profile for Health. At the top of the hierarchy, both L1 data cache refill reads (1DCRFPKI) and instruction cache misses (1ILMPKI) are clearly higher than those in switching and quenching, indicating that sliding introduces sustained disruptions to both data locality and frontend instruction fetch. In addition, several L2 and L3 indicators rise simultaneously under sliding, including L2 victim writeback (2DCWVPKI), L2 refill reads (2DCRFPKI), and L3 refill (3DCRFPKI). This pattern suggests that increased activity and misses at the L1 level expose more accesses to lower cache levels, where they appear as stronger refill and writeback demands.

In contrast, the cache-pressure profile under switching is noticeably weaker and more localized, consistent with a more phase-like execution pattern dominated by transient state transitions. Under quenching, the overall cache activity further contracts, and most cache indicators remain at low levels, reflecting reduced foreground rendering work and a smaller active working set.

Finally, the cache hierarchy configuration helps explain why mid- and lower-level cache indicators are emphasized when interpreting interactive behaviors. Across the evaluated platforms, L1 cache capacities are largely similar across core types, whereas L2 capacities vary more substantially across designs, with L3 differences being moderate (Table 2). Therefore, once interactive execution disrupts L1 locality, the resulting pressure is more likely to be reflected by L2/L3 refills and writebacks, which capture how misses propagate and accumulate along the hierarchy.

Insight 5: Instruction-side address translation pressure is significantly amplified for Wiz under quenching interactions, driven by a mismatch between active instruction page sets and iTLB coverage.

As shown in Figure 5, the instruction-side TLB behavior of Wiz exhibits a clear interaction-dependent pattern across all four platforms. Under the sliding and switching scenarios, execution largely follows stable document rendering and editing paths. The active instruction page set therefore evolves slowly, allowing iTLB entries to be effectively reused. As a result, the instruction TLB refill rate (IITRPKI), page walk activity (ITWPKI), and translation access intensity (ITLPKI) remain low and stable.

In contrast, under the quenching scenario, execution no longer relies on continuous rendering loops. Instruction paths and their associated translation contexts are frequently disrupted, causing the active instruction page set to diverge from the existing iTLB coverage. This mismatch does not primarily manifest as a sharp increase in TLB refills, but instead leads to a pronounced amplification of translation accesses and page table walks, reflected by the synchronized rise of ITLPKI and ITWPKI across platforms.

Notably, the magnitude of this amplification varies across different mobile SoCs, indicating that quenching-induced instruction-side translation pressure is highly sensitive to micro-architecture design choices. Overall, the TLB behavior of Wiz demonstrates that interaction patterns alone can fundamentally reshape instruction-side address translation pressure. Evaluations limited to continuous interaction scenarios may therefore underestimate frontend constraints that emerge during less active, but structurally disruptive, interaction phases.

Insight 6: Under the sliding interaction scenario, the Easymoney application exhibits significantly intensified dynamic variations in its branch control-flow paths. Branch prediction path perturbations and branch-related memory access pressure are synchronously amplified. On most evaluated platforms, the feature intensity reaches its maximum under this scenario, and sliding consistently demonstrates the most pronounced variation magnitude across platforms. In contrast, the branch control-flow behavior under switching and quenching scenarios is generally more convergent.

This phenomenon reflects the direct impact of interaction pattern variations on the stability of branch control flow. As illustrated in Figure 6, under the sliding scenario, Easymoney must continuously respond to high-frequency user inputs and interface scrolling operations. Its execution repeatedly traverses conditional evaluations, state updates, and event dispatch routines, causing branch paths to fluctuate more dynamically across consecutive execution intervals. This behavior is directly manifested as a significant increase in branch prediction path perturbation intensity (BRPMPKI), accompanied by a synchronous rise in branch-related memory access pressure (BRLMPKI). These observations indicate that, under sliding interactions, branch prediction and control-flow handling must continuously adapt to rapidly changing execution paths, thereby incurring elevated branch-related control pressure.

In contrast, under the switching scenario, interface transitions typically involve a limited number of control-flow redirections. Branch behavior in this case exhibits clear phase characteristics, with execution paths converging within a short period. As a result, the overall intensity of branch-related features is markedly lower than that observed under the sliding scenario. Under the quenching scenario, the application no longer sustains continuous foreground interaction. Execution is primarily driven by sporadic background logic, further simplifying the branch control-flow structure. Consequently, branch prediction path perturbations and related memory access behaviors converge, without inducing sustained branch-related pressure.

Overall, the branch control-flow behavior of Easymoney is not solely determined by application functionality complexity, but is highly dependent on the execution path dynamics induced by interaction patterns. Under continuously interaction-driven sliding scenarios, frequent branch path variations more readily amplify branch prediction perturbations and branch-related memory access behavior, rendering branch control a potential performance limiting factor.

Insight 7: In PVZ (Plants vs. Zombies), a game workload that is not dominated by high-frequency rendering, speculation-related memory access and synchronization behaviors are systematically amplified under non-foreground interaction scenarios, exhibiting a pronounced background-execution-dominated characteristic.

As shown in Figure 7, unlike interaction-intensive applications that rely on continuous user input, the core execution logic of PVZ is primarily driven by game state progression, unit behavior evaluation, and time-driven events. Its computation continues to execute even in the absence of active foreground interaction. Correspondingly, under the quenching scenario, speculative load/store execution events (LSPCPKI) are significantly higher than those observed under sliding and switching across multiple platforms. This indicates that during sustained background execution, the program exposes more opportunities related to memory access uncertainty for speculative execution. Meanwhile, speculation failure events associated with synchronization semantics (STPFPKI), as well as exclusive access trigger behaviors (LDREXPKI), also exhibit consistent amplification under the quenching scenario, reflecting more active internal state updates and shared resource management during background execution phases.

It is worth noting that on certain platforms (e.g., M11), speculation-related features under the switching scenario exceed those observed under quenching. This discrepancy does not alter the overall trend, but instead reflects implementation differences in task scheduling and power management policies across platforms. Frequent foreground–background transitions may introduce stronger execution state perturbations over short time scales, thereby amplifying speculation-related behavior along memory access and synchronization paths. Overall, these results indicate that the speculative execution pressure in PVZ is not directly driven by interaction intensity, but primarily stems from its state-machine- and time-driven game logic structure. This characteristic reveals a representative micro-architecture behavior pattern of background-computation-dominated game workloads.

Insight 8: In Messenger, a background-resident communication application, memory and interconnect access behaviors are systematically amplified under the quenching scenario, exhibiting a pronounced background data access–dominated characteristic.

As shown in Figure 8, memory- and interconnect-related features of Messenger under the quenching scenario are consistently higher than those observed under sliding and switching across all four evaluated platforms. In particular, main memory read access events (MARPKI) exhibit a coherent increase in the screen-off state, while bus access events (BARPKI) are simultaneously intensified. These observations indicate that a substantial fraction of memory requests exceeds the effective coverage of private cache hierarchies and is increasingly exposed to lower-level memory and interconnect paths.

This behavior reflects the fact that Messenger continues to execute background logic centered on network communication, message synchronization, and data maintenance even in the absence of foreground interaction. Unlike foreground interaction–driven workloads that typically involve short-lived data and limited working sets, such background tasks operate on larger and cross-module data structures, whose access patterns are less amenable to cache locality. As a result, memory accesses are more likely to reach main memory and amplify interconnect traffic during background execution. In contrast, under the sliding and switching scenarios, execution is primarily constrained by foreground interaction and interface state updates, and memory access behavior tends to converge within the cache hierarchy, preventing memory and interconnect subsystems from becoming dominant sources of sustained pressure.

Overall, the memory and interconnect behavior of Messenger suggests that, for background-resident communication applications, architectural performance pressure does not necessarily diminish with reduced interaction intensity. Instead, under non-foreground interaction scenarios, performance demand may shift toward main memory and interconnect subsystems. These results highlight the role of interaction patterns in reshaping system-level memory access paths and underscore the importance of explicitly considering background execution scenarios when characterizing the micro-architecture behavior of persistent communication workload.

6.4. Discussion of Implications

The eight interaction-dependent insights presented above indicate that micro-architecture bottlenecks in mobile workloads often span multiple subsystems under specific interaction contexts, rather than appearing as isolated single-component limitations. Connecting these observations allows a more holistic interpretation of interaction-driven performance behavior and its architectural implications.

Different interaction scenarios induce distinct cross-subsystem pressure patterns. Under sliding interactions, continuous interface updates and high-frequency input handling coincide with unstable control flow and increased frontend activity. At the same time, pressure in the cache hierarchy also rises, reflected by the concurrent amplification of branch-related and cache-related features. These trends are consistently observed in Figure 4 and Figure 6. In contrast, under quenching interactions, execution becomes structurally discontinuous and is characterized by increased pressure on address translation, speculative execution, and main memory access paths, as shown in Figure 5, Figure 6, Figure 7 and Figure 8.

These cross-subsystem pressure patterns expose the limitations of optimization strategies that focus on individual hardware components in isolation. For example, simply increasing cache capacity is unlikely to alleviate sliding-induced performance degradation if branch predictability and frontend robustness are not improved simultaneously. Similarly, reducing memory latency alone may not effectively mitigate quenching-induced slowdowns if instruction-side address translation and speculation-related behaviors remain unaddressed.

From a design perspective, effective interaction-aware optimization requires coordinated consideration of the dominant subsystem combinations associated with each interaction scenario. Sliding-dominated usage patterns benefit most from architectures that emphasize predictable control flow and robust frontend handling. In contrast, background or low-interaction scenarios demand stronger support for address translation efficiency, speculative execution robustness, and memory–interconnect resilience. Bridging PMU-level observations with real-world mobile usage patterns is therefore essential for translating micro-architecture analysis into tangible improvements in user-perceived responsiveness and overall system efficiency.

It is worth noting that CharSPBench is designed as an analysis-oriented characterization framework rather than a modular predictive model. Its key components, including SGBRT-based important event identification and the Intensity-aware Load Characterization (ILC), form a tightly coupled analysis pipeline. Removing the SGBRT stage requires reverting to the full high-dimensional PMU event space, which significantly complicates bottleneck identification, while removing the ILC stage limits the analysis to isolated events and reduces subsystem-level interpretability. As a result, conventional ablation-style decomposition is not well aligned with the objectives of this structured micro-architecture analysis workflow.

While CharSPBench provides a structured and interpretable framework for interaction-aware micro-architecture characterization, its scope is intentionally constrained. The analysis focuses on a small set of representative interaction scenarios to balance realism and interpretability, rather than exhaustively covering all possible user behaviors. Moreover, the proposed approach is analysis-oriented and does not incorporate optimization or tuning mechanisms, as its goal is to expose interpretable bottleneck patterns rather than to directly improve performance. Finally, like other real-device mobile profiling studies, the analysis may be affected by background system activities, which reflects an inherent trade-off between measurement fidelity and practical deployability.

7. Conclusions and Future Work

This paper presented CharSPBench, an interaction-aware micro-architecture characterization framework for mobile workloads under realistic user interaction scenarios. By explicitly incorporating interaction-triggered execution segments into the analysis process, CharSPBench addresses the limitations of existing profiling and characterization approaches that assume stable execution behavior. The proposed framework enables interpretable analysis of micro-architecture behavior variations across different interaction scenarios on mobile processors.

Based on CharSPBench, this study distilled eight representative micro-architecture insights that reveal how interaction patterns reshape execution paths and resource pressure across major architectural subsystems. Experimental results on multiple real-world mobile applications and heterogeneous smartphone platforms demonstrate that CharSPBench can consistently identify key performance bottlenecks under diverse interaction conditions, providing practical and interpretable guidance for mobile processor evaluation and optimization.

Future work will extend CharSPBench to emerging interaction-intensive workloads with more complex execution dynamics, such as augmented reality applications and cloud-based mobile games. These scenarios introduce richer interaction patterns, tighter latency constraints, and stronger coupling between local execution and remote services, posing new challenges for micro-architecture characterization. Exploring how interaction-aware analysis can be adapted to such environments may further enhance the applicability of CharSPBench in next-generation mobile systems.

Author Contributions

Conceptualization, C.O. and G.L.; Methodology, C.O.; Software, C.O.; Formal analysis, C.O.; Writing—original draft preparation, C.O.; Writing—review and editing, C.O., Z.Y. and G.L.; Supervision, G.L.; Funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC), grant numbers 62272176 and 62302180.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ouyang, C.; Xin, J.; Zeng, S.; Li, G.; Li, J.; Yu, Z. Constructing a Supplementary Benchmark Suite to Represent Android Applications with User Interactions by using Performance Counters. ACM Trans. Archit. Code Optim. 2025, 22, 1–28. [Google Scholar] [CrossRef]
Kesavan, R.; Gay, D.; Thevessen, D.; Shah, J.; Mohan, C. Firestore: The nosql serverless database for the application developer. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 3376–3388. [Google Scholar]
Li, T.; Xia, T.; Wang, H.; Tu, Z.; Tarkoma, S.; Han, Z.; Hui, P. Smartphone app usage analysis: Datasets, methods, and applications. IEEE Commun. Surv. Tutor. 2022, 24, 937–966. [Google Scholar] [CrossRef]
Hort, M.; Kechagia, M.; Sarro, F.; Harman, M. A survey of performance optimization for mobile applications. IEEE Trans. Softw. Eng. 2021, 48, 2879–2904. [Google Scholar] [CrossRef]
Feng, Y.; Zhu, Y. Pes: Proactive event scheduling for responsive and energy-efficient mobile web computing. In Proceedings of the 46th International Symposium on Computer Architecture, Phoenix, AZ, USA, 22–26 June 2019; pp. 66–78. [Google Scholar]
Bose, P.; Das, D.; Vasan, S.; Mariani, S.; Grishchenko, I.; Continella, A.; Bianchi, A.; Kruegel, C.; Vigna, G. Columbus: Android app testing through systematic callback exploration. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, VIC, Australia, 14–20 May 2023; pp. 1381–1392. [Google Scholar]
Lin, H.; Liu, C.; Li, Z.; Qian, F.; Li, M.; Xiong, P.; Liu, Y. Aging or glitching? What leads to poor Android responsiveness and what can we do about it? IEEE Trans. Mob. Comput. 2023, 23, 1521–1533. [Google Scholar] [CrossRef]
Qian, F.; Wang, Z.; Gerber, A.; Mao, Z.; Sen, S.; Spatscheck, O. Profiling resource usage for mobile applications: A cross-layer approach. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, Washington, DC, USA, 28 July–1 June 2011; pp. 321–334. [Google Scholar]
Flores-Martin, D.; Laso, S.; Herrera, J.L. Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior. Electronics 2024, 13, 4897. [Google Scholar] [CrossRef]
Suo, K.; Shi, Y.; Hung, C.C.; Bobbie, P. Quantifying context switch overhead of artificial intelligence workloads on the cloud and edges. In Proceedings of the 36th Annual ACM Symposium on Applied Computing, Gwangju, Korea, 22–26 March 2021; pp. 1182–1189. [Google Scholar]
Fried, J.; Ruan, Z.; Ousterhout, A.; Belay, A. Caladan: Mitigating interference at microsecond timescales. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), Banff, AL, Canada, 4–6 November 2020; pp. 281–297. [Google Scholar]
Miao, Z.; Shao, C.; Li, H.; Tang, Z. Review of Task-Scheduling Methods for Heterogeneous Chips. Electronics 2025, 14, 1191. [Google Scholar] [CrossRef]
Korndörfer, J.H.M.; Eleliemy, A.; Simsek, O.S.; Ilsche, T.; Schöne, R.; Ciorba, F.M. How do os and application schedulers interact? an investigation with multithreaded applications. In Proceedings of the European Conference on Parallel Processing, Madrid, Spain, 26–30 August 2023; pp. 214–228. [Google Scholar]
Weingarten, M.E.; Grieco, M.; Edwards, S.; Khan, T.A. Icicle: Open-Source Hardware Support for Top-Down Microarchitectural Analysis on RISC-V. In Proceedings of the 2025 IEEE International Symposium on Workload Characterization (IISWC), Irvine, CA, USA, 12–14 October 2025; pp. 464–477. [Google Scholar] [CrossRef]
Zhu, Y.; Wei, S.; Tiwari, M. Revisiting Browser Performance Benchmarking From an Architectural Perspective. IEEE Comput. Archit. Lett. 2022, 21, 113–116. [Google Scholar] [CrossRef]
Jang, H.; Jo, J.E.; Lee, J.; Kim, J. Rpstacks-mt: A high-throughput design evaluation methodology for multi-core processors. In Proceedings of the 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka, Japan, 20–24 October 2018; pp. 586–599. [Google Scholar]
Bai, C.; Huang, J.; Wei, X.; Ma, Y.; Li, S.; Zheng, H.; Yu, B.; Xie, Y. ArchExplorer: Microarchitecture exploration via bottleneck analysis. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, Toronto, ON, Canada, 28 October–1 November 2023; pp. 268–282. [Google Scholar]
Bai, C.; Sun, Q.; Zhai, J.; Ma, Y.; Yu, B.; Wong, M.D. BOOM-Explorer: RISC-V BOOM microarchitecture design space exploration framework. In Proceedings of the 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 1–4 November 2021; pp. 1–9. [Google Scholar]
Criswell, K.; Adegbija, T. A survey of phase classification techniques for characterizing variable application behavior. IEEE Trans. Parallel Distrib. Syst. 2019, 31, 224–236. [Google Scholar] [CrossRef]
Wang, K.; Li, Y.; Wang, C.; Jia, T.; Chow, K.; Wen, Y.; Dou, Y.; Xu, G.; Hou, C.; Yao, J.; et al. Characterizing job microarchitectural profiles at scale: Dataset and analysis. In Proceedings of the 51st International Conference on Parallel Processing, Bordeaux, France, 29 August–1 September 2022; pp. 1–11. [Google Scholar]
Schall, D.; Margaritov, A.; Ustiugov, D.; Sandberg, A.; Grot, B. Lukewarm serverless functions: Characterization and optimization. In Proceedings of the 49th Annual International Symposium on Computer Architecture, New York, NY, USA, 18–22 June 2022; pp. 757–770. [Google Scholar]
Segu Nagesh, S.; Fernando, N.; Loke, S.W.; Neiat, A.G.; Pathirana, P.N. A Dependency-Aware Task Stealing Framework for Mobile Crowd Computing. Fut. Int. 2025, 17, 446. [Google Scholar] [CrossRef]
Li, S.; Yu, F.; Zhang, S.; Yin, H.; Lin, H. Optimization of Direct Convolution Algorithms on ARM Processors for Deep Learning Inference. Mathematics 2025, 13, 787. [Google Scholar] [CrossRef]
Andrijević, N.; Lovreković, Z.; Salkić, H.; Šarčević, Đ.; Perišić, J. Benchmarking PHP–MySQL Communication: A Comparative Study of MySQLi and PDO Under Varying Query Complexity. Electronics 2025, 15, 21. [Google Scholar] [CrossRef]
Abbasi, M.; Bernardo, M.V.; Váz, P.; Silva, J.; Martins, P. Adaptive and scalable database management with machine learning integration: A PostgreSQL case study. Information 2024, 15, 574. [Google Scholar] [CrossRef]
Wang, Z.; Liu, S.; Ji, D.; Yi, W. Improving real-time performance of micro-ros with priority-driven chain-aware scheduling. Electronics 2024, 13, 1658. [Google Scholar] [CrossRef]
Li, B.; Arora, R.; Samsi, S.; Patel, T.; Arcand, W.; Bestor, D.; Byun, C.; Roy, R.B.; Bergeron, B.; Holodnak, J.; et al. Ai-enabling workloads on large-scale gpu-accelerated system: Characterization, opportunities, and implications. In Proceedings of the 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea, 2–6 April 2022; pp. 1224–1237. [Google Scholar]
Bucek, J.; Lange, K.D.; v. Kistowski, J. SPEC CPU2017: Next-generation compute benchmark. In Proceedings of the Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, Berlin, Germany, 9–13 April 2018; pp. 41–42. [Google Scholar]
Janapa Reddi, V.; Kanter, D.; Mattson, P.; Duke, J.; Nguyen, T.; Chukka, R.; Shiring, K.; Tan, K.S.; Charlebois, M.; Chou, W.; et al. MLPerf mobile inference benchmark: An industry-standard open-source machine learning benchmark for on-device AI. Proc. Mach. Learn. Syst. 2022, 4, 352–369. [Google Scholar]
Wang, Y.; Lee, V.; Wei, G.Y.; Brooks, D. Predicting new workload or CPU performance by analyzing public datasets. ACM Trans. Archit. Code Optim. (TACO) 2019, 15, 1–21. [Google Scholar] [CrossRef]
Kariofillis, V.; Jerger, N.E. Workload Characterization of Commercial Mobile Benchmark Suites. In Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Indiana, IN, USA, 5–7 May 2024; pp. 73–84. [Google Scholar]
Ferraz, O.; Menezes, P.; Silva, V.; Falcao, G. Benchmarking vulkan vs. opengl rendering on low-power edge gpus. In Proceedings of the 2021 International Conference on Graphics and Interaction (ICGI), Porto, Portugal, 4–5 November 2021; pp. 1–8. [Google Scholar]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Mienye, I.D.; Sun, Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
Yu, Z.; Xiong, W.; Eeckhout, L.; Bei, Z.; Mendelson, A.; Xu, C. Mia: Metric importance analysis for big data workload characterization. IEEE Trans. Parallel Distrib. Syst. 2017, 29, 1371–1384. [Google Scholar] [CrossRef]

Figure 1. PMU event coverage of representative mobile CPU benchmark suites.

Figure 2. Overview of the CharSPBench framework for interaction-aware micro-architecture bottleneck characterization. SGBRT is Stochastic Gradient Boosting Regression Trees.

Figure 3. Proportional distribution of key miss-related features across different interaction scenarios.

Figure 4. Cache behavior of the SPBench benchmark Health on the OnePlus Ace (OA) platform under different interaction scenarios.

Figure 5. TLB behavior of the SPBench benchmark Wiz across different interaction scenarios on four mobile platforms.

Figure 6. Branch behavior of the Easymoney benchmark in SPBench under different interaction scenarios on four smartphones.

Figure 7. Speculative execution behavior of the PVZ benchmark in SPBench under different interaction scenarios on four smartphones.

Figure 8. Memory and interconnect behavior of the Messenger benchmark in SPBench under different interaction scenarios on four smartphones.

Table 2. Detailed Cache Hierarchy Configuration of Experimental Mobile Platforms.

Phone	Huawei Mate 30 5G	Samsung Galaxy Note10 5G
SoC	Kirin 990 5G	Snapdragon 855 5G
L1 Cache	Per Big Core: 64 KB Inst. & 64 KB Data Per Mid Core: 64 KB Inst. & 64 KB Data Per Little Core: 32 KB Inst. & 32 KB Data	Per Big Core: 64 KB Inst. & 64 KB Data Per Mid Core: 64 KB Inst. & 64 KB Data Per Little Core: 32 KB Inst. & 32 KB Data
L2 Cache	Per Big Core: 512 KB Per Mid Core: 512 KB Per Little Core: 128 KB	Per Big Core: 512 KB Per Mid Core: 256 KB Per Little Core: 128 KB
L3 Cache	2MB	2MB
Phone	Xiaomi Mi 11 Pro	OPPO OnePlus Ace
SoC	Snapdragon 888 5G	Dimensity 8100-MAX 5G
L1 Cache	Per Big Core: 64 KB Inst. & 64 KB Data Per Mid Core: 64 KB Inst. & 64 KB Data Per Little Core: 32 KB Inst. & 32 KB Data	Per Big Core: 64 KB Inst. & 64 KB Data Per Little Core: 32 KB Inst. & 32 KB Data
L2 Cache	Per Big Core: 1MB Per Mid Core: 512 KB Per Little Core: 128 KB	Per Big Core: 512 KB Per Little Core: 128 KB
L3 Cache	4 MB	4 MB

Table 3. Miss-related features and abbreviations.

Features	Abbreviation
branch-load-misses	BRLMPKI
branch-store-misses	BRSMPKI
dTLB-load-misses	DTLMPKI
iTLB-load-misses	ITLMPKI
L1-dcache-load-misses	1DLMPKI
L1-dcache-store-misses	1DSMPKI
L1-icache-load-misses	1ILMPKI
branch-misses	BRMSPKI
cache-misses	CAMIPKI

Table 4. Five semantically grouped and de-redundant micro-architecture feature sets.

Subsystem Category	Micro-Architecture Feature	Abbreviation
Cache Hierarchy	raw-l2d-cache-wb-victim	2DCWVPKI
	raw-l2d-cache-refill-rd	2DCRFPKI
	raw-l2d-cache-wr	2DCWPKI
	raw-l2d-cache-rd	2DCRPKI
	raw-l3d-cache-rd	3DCRPKI
	raw-l3d-cache-refill	3DCRFPKI
	raw-l1d-cache-wb-clean	1DCWCPKI
	raw-l1d-cache-refill-rd	1DCRFPKI
	L1-icache-load-misses	1ILMPKI
TLB (Address Translation)	raw-l1i-tlb-refill	1ITRPKI
	raw-itlb-walk	ITWPKI
	iTLB-loads	ITLPKI
Branch Control	raw-br-mis-pred	BRPMPKI
Branch Control	branch-load-misses	BRLMPKI
Speculative Execution	raw-ldst-spec	LSPCPKI
	raw-strex-fail-spec	STPFPKI
	raw-ldrex-spec	LDREXPKI
Memory and Interconnect	raw-mem-access-rd	MARPKI
Memory and Interconnect	raw-bus-access-rd	BARPKI

Table 5. Intensity-based distribution of SPBench benchmarks across five micro-architecture categories.

Benchmark	Cache Hierarchy	TLB Behavior	Branch Control	Speculative Execution	Memory and Interconnect
Tmall
Coolapk	✓			✓
Netease
Googledrive
Yinxiang
Baidu			✓
Gifmaker
PVZ		✓		*	✓
Wiz	✓	*	✓	✓	✓
Messenger	✓	✓	✓	✓	*
Meituan
Ctrip
Easymoney	✓	✓	*		✓
Zhihu
Health	*	✓	✓	✓	✓

Note: The benchmark suite is provided via GitHub: https://github.com/Ephemera-Ouyang/BEMAP (accessed on 14 January 2026). A checkmark (✓) indicates that the benchmark is classified as intensive in the corresponding category, while an asterisk (*) denotes the benchmark with the highest intensity in that category. Benchmarks are listed following the original SPBench ordering, which reflects their representativeness determined during benchmark construction.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ouyang, C.; Yang, Z.; Li, G. CharSPBench: An Interaction-Aware Micro-Architecture Characterization Framework for Smartphone Benchmarks. Electronics 2026, 15, 432. https://doi.org/10.3390/electronics15020432

AMA Style

Ouyang C, Yang Z, Li G. CharSPBench: An Interaction-Aware Micro-Architecture Characterization Framework for Smartphone Benchmarks. Electronics. 2026; 15(2):432. https://doi.org/10.3390/electronics15020432

Chicago/Turabian Style

Ouyang, Chenghao, Zhong Yang, and Guohui Li. 2026. "CharSPBench: An Interaction-Aware Micro-Architecture Characterization Framework for Smartphone Benchmarks" Electronics 15, no. 2: 432. https://doi.org/10.3390/electronics15020432

APA Style

Ouyang, C., Yang, Z., & Li, G. (2026). CharSPBench: An Interaction-Aware Micro-Architecture Characterization Framework for Smartphone Benchmarks. Electronics, 15(2), 432. https://doi.org/10.3390/electronics15020432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

CharSPBench: An Interaction-Aware Micro-Architecture Characterization Framework for Smartphone Benchmarks

Abstract

1. Introduction

2. Related Work

3. Background and Motivation

3.1. SPBench Overview

3.2. Algorithms Used in This Study

3.2.1. Stochastic Gradient Boosting Regression Trees

3.2.2. Z-Score Normalization

4. CharSPBench Methodology

4.1. Interaction-Driven Micro-Architecture Event Modeling and Structuring

4.2. MIA-Based Analysis Procedure for Mobile Micro-Architecture

4.3. Intensity-Aware Load Characterization (ILC)

5. Experimental Setup

6. Results and Analysis

6.1. Preliminary Analysis of Interaction-Driven Miss-Related Features

6.2. Semantic Grouping and Redundancy Reduction of Important Micro-Architecture Features

6.3. Intensity Profiling of SPBench Benchmarks

6.4. Discussion of Implications

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI