SCRAM: A Scenario-Based Framework for Evaluating Regulatory and Fairness Risks in AI Surveillance Systems

Kesgin, Kadir; Kosunalp, Selahattin; Beloev, Ivan

doi:10.3390/app15169038

Open AccessArticle

SCRAM: A Scenario-Based Framework for Evaluating Regulatory and Fairness Risks in AI Surveillance Systems

by

Kadir Kesgin

^1,*

,

Selahattin Kosunalp

¹ and

Ivan Beloev

²

¹

Department of Computer Technologies, Gönen Vocational School, Bandırma Onyedi Eylül University, 10250 Bandırma, Balıkesir, Türkiye

²

Department of Transport, University of Ruse, 7017 Ruse, Bulgaria

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 9038; https://doi.org/10.3390/app15169038

Submission received: 2 July 2025 / Revised: 25 July 2025 / Accepted: 7 August 2025 / Published: 15 August 2025

(This article belongs to the Special Issue Artificial Intelligence on the Edge for Industry 4.0)

Download

Browse Figures

Versions Notes

Abstract

As artificial intelligence systems increasingly govern public safety operations, concerns over algorithmic fairness and legal compliance intensify. This study introduces a scenario-based evaluation framework (SCRAM) that simultaneously measures regulatory conformity and bias risks in AI-enabled surveillance. Using license plate recognition (LPR) systems in Türkiye as a case study, we simulate multiple operational configurations that vary decision thresholds and data retention periods. Each configuration is assessed through fairness metrics (SPD, DIR) and a compliance score derived from KVKK (Türkiye’s Personal Data Protection Law) and constitutional jurisprudence. Our findings show that technical performance does not guarantee normative acceptability: several configurations with high detection accuracy fail to meet legal and fairness thresholds. The SCRAM model offers a modular and adaptable approach to align AI deployments with ethical and legal standards and highlights how policy-sensitive parameters critically shape risk landscapes. We conclude with implications for real-time audit systems and cross-jurisdictional AI governance.

Keywords:

LPR; algorithmic fairness; KVKK compliance; SCRAM framework; AI surveillance

1. Introduction

This study evaluates AI-enabled surveillance systems through the lens of fairness and legal compliance, focusing on LPR in Türkiye. Key metrics include the Statistical Parity Difference (SPD), which measures disparities in detection rates across demographic groups, and the Disparate Impact Ratio (DIR), which assesses the ratio of detection outcomes between groups. These metrics, alongside a compliance score based on KVKK, form the core of our Scenario-based Compliance and Risk Assessment Model (SCRAM).

1.1. Global Surge in AI-Driven Surveillance

The recent decade has witnessed an exponential diffusion of sensing platforms based on artificial intelligence (AI) in public safety ecosystems. Deep learning pipelines—particularly convolutional neural networks for object detection and advanced optical character recognition (OCR) modules—now underpin more than 80 million camera units worldwide, a ten-fold increase since 2015 [1]. Market forecasts predict a compound annual growth rate of 19% for AI-enabled closed-circuit television (CCTV) between 2024 and 2029 [2]. The momentum of policy mirrors the technological boom: EU AI Act (2025) classifies remote biometric identification in public spaces as “high risk”, while Interpol’s AI Surveillance Guidelines (2024) articulates operational due diligence for law enforcement agencies [3,4].

1.2. Domestic Landscape: Türkiye’s KGYS and PTS Infrastructure

Türkiye’s Kent Güvenlik Yönetim Sistemi (KGYS) integrates police patrol data, emergency dispatch, and nationwide camera networks. According to the 2024 KGYS Activity Report, 30 412 fixed cameras and 4120 mobile units are operational; approximately 12% run license-plate recognition TS modes that link vehicle trajectories to national and international watchlists [5]. Pilot studies in [6] document a 27% rise in stolen-vehicle recovery attributable to PTS alerts. However, storage duration, threshold tuning

(τ)

and third-party data access vary drastically across provinces, reflecting a lack of unified compliance protocols.

1.3. Legal Context: The I–D–A Triad

Two landmark rulings demarcate the legal perimeter. Danıştay 15th Chamber, E. 2014/4562 unequivocally recognized license-plate strings as personal data, placing PTS within the scope of the KVKK. Constitutional Court, B.No 2018/30296 subsequently held that long-term storage of CCTV and PTS footage constitutes a disproportionate interference with the constitutional right to privacy (Art. 20). We crystallize these layered obligations into an I–D–A mapping: Ilke (KVKK Art. 4 principles), Danıştay precedent and AYM constitutional proportionality. Figure 1 visualizes how the triad intersects the PTS data lifecycle.

1.4. Research Gap

Existing Turkish scholarship bifurcates into performance-centric engineering studies [7] and doctrinal legal commentaries. The former optimize true-positive rates (TPR) but ignore lawful-processing constraints; the latter examine KVKK articles while abstracting away technical parameters such as confidence threshold

(τ)

or retention window

(δ)

. Internationally, integrated audit frameworks are emerging [8], yet none align with the domestic I–D–A constellation. No published work quantitatively co-optimizes compliance scores

(ε)

and bias metrics (SPD) for Turkish PTS deployments.

1.5. Contributions and Article Structure

We bridge the gap by introducing the Scenario-based Compliance and Risk Assessment Model (SCRAM) and applying it to a metropolitan PTS network comprising 325 fixed and 48 mobile cameras. Building on a 10,000-row anonymized Monte Carlo simulation annotated with region and vehicle-type metadata, we evaluate nine configuration scenarios. Our contributions are four-fold:

A reproducible Python codebase and synthetic dataset aligned with I–D–A principles;
A legal-technical compliance score $(ε)$ spanning five KVKK criteria and three precedential obligations;
Empirical evidence that strict thresholds $(τ \geq 0.94)$ reduce false-positive rates without inflating SPD beyond 0.06;
Policy guidance on edge-level anonymization and 30-day retention that jointly maximize KVKK adherence and fairness.

Organization of the Paper

The remainder of this paper is structured as follows: Section 2 reviews related literature and key concepts in LPR fairness and compliance. Section 3 details our simulation methodology and metric definitions. Section 4 presents experimental results and discusses the policy implications. Section 5 summarizes limitations and outlines future research directions. The Appendices provide full dataset generation scripts and metric computation procedures.

2. Literature Review

This section surveys existing research on AI-enabled surveillance, focusing on technical performance, legal compliance, and fairness considerations. It positions the SCRAM framework within global and domestic scholarship, highlighting gaps in integrated legal/technical evaluations.

2.1. Global Proliferation of AI-Enabled Surveillance

Between 2015 and 2024, at least 85 nations adopted some form of AI-augmented public-camera analytics, with license-plate recognition (LPR) ranking second only to facial recognition in deployment frequency [9]. The AI Surveillance Index (2025) shows a mean annual growth rate of 18% in LPR installations within OECD countries, driven by declining hardware costs and improved OCR accuracy [10]. Figure 2 illustrates the diffusion curve, highlighting Türkiye’s entry into the “high-penetration” quartile by 2023.

2.2. Performance Metrics in PTS Research

Engineering studies focus heavily on detection accuracy. [7] benchmarked six CNN backbones on the Turkish PTS dataset, reporting a top-1 precision of 97.1%. However, parameter sensitivity analyses remain scarce. [8] show that tightening the decision threshold from 0.90 to 0.95 lowers false positives by 40% but concomitantly increases inference latency. Table 1 compares recent accuracy—throughput trade-offs, underscoring the absence of integrated legal-risk considerations.

2.3. Data Protection Frameworks and Compliance Audits

The KVKK embodies five core principles (lawfulness, purpose limitation, data minimisation, accuracy, retention) that PTS operators must satisfy. Unlike the EU AI Act, KVKK lacks explicit “risk tier” classifications, delegating proportionality assessments to data controllers [6]. Early compliance audits (2019–2022) reveal inconsistent retention windows and inadequate logging of third-party queries [5].

2.4. Algorithmic Bias and Fairness Metrics

While LPR ostensibly processes vehicle data rather than sensitive personal traits, regional or socio-economic proxies can still induce disparate impacts. [1] illustrate how commercial-vehicle over-representation near freight hubs skews risk scores. Fairness metrics such as SPD and DIR are therefore recommended even for LPR audits [8]. Our study extends this line by linking bias scores to KVKK compliance in a unified optimisation schema.

2.5. Anonymization and Privacy-Enhancing Technologies

Edge-level hashing, k-token vaults and differential privacy

(ε \leq 1)

have emerged as practical mitigations for plate-to-identity linkage [11]. However, Turkish deployments rarely incorporate on-device anonymizers; encryption typically occurs post-ingest at central servers. We adopt the three-layer anonymization stack (edge masking → token vault → crypto-erasure) to test its effect on compliance scores.

2.6. Comparative Legislation: KVKK vs. EU AI Act

Table 2 cross-maps KVKK Art. 4 principles to EU AI Act Title III, revealing gaps in obligation granularity and enforcement timelines. Unlike the AI Act’s mandatory risk classification for remote biometric systems, KVKK depends heavily on judicial interpretation. The proposed SCRAM model integrates these differences by scoring scenario-specific outcomes under both legal regimes.

3. Methodology

This section describes the SCRAM framework’s design, including the synthetic dataset, simulation scenarios, compliance scoring, and fairness metrics. It provides a detailed methodology for evaluating LPR systems under legal and ethical constraints.

3.1. Key Metrics and Notation

The key fairness metrics used in this study are SPD and DIR. Let n denote the total number of samples in the dataset, let

\hat{Y}

be the predicted outcome by the LPR system (1 for positive detection, and 0 otherwise), and let A be a protected attribute such as region or vehicle type. The compliance score for each criterion i is denoted as

c_{i}

, calculated according to the rubric in Table 3. The overall compliance score S is then computed as

S = \sum_{i = 1}^{n} w_{i} \cdot c_{i}

, where

w_{i}

is the weight for criterion i (see Section 3.3 for full details).

3.2. Case Selection and Legal Mapping

This study focuses on the Plate Recognition System (PTS) as a subcomponent of Istanbul’s citywide Smart Surveillance Grid. The selection is grounded in legal precedent: Council of State ruling E.2014/4562 and Constitutional Court case B.No 2018/30296 define number plate data as personally identifiable information. These rulings shape compliance criteria such as storage duration, consent, and data minimisation.

3.3. Synthetic Dataset Construction

Given the lack of access to actual TP/FP logs due to legal constraints, we generated a synthetic dataset using stratified random sampling and distribution parameters derived from public sector reports. The dataset comprises 10,000 entries with variables:

Match vs. no_match labels
Region ∈ urban, suburb, periphery
Vehicle_type ∈ private, commercial

Seed value was fixed at 42 for reproducibility. Full generation code and documentation is included as Appendix A.

3.3.1. Simulation Setup and Scenario Selection

We constructed nine simulation scenarios by systematically varying the detection threshold (

τ \in {0.90, 0.94, 0.97}

) and data retention period (

δ \in {30, 90, 180}

days). The parameter grid was designed to span the operational range most frequently encountered in public sector deployments, as documented in the 2024 KGYS Report and international LPR benchmarks, thereby maximizing the external relevance of our simulated findings [5]. Alternative thresholds (e.g.,

τ = 0.85

) or retention periods (e.g.,

δ = 60

days) were not tested to maintain focus on prevalent configurations and due to computational constraints, but future work could expand the parameter grid for broader generalizability. While the dataset is synthetic due to the lack of public access to real LPR logs (see Limitations), the data distribution and scenario parameters are derived from official reports and court documents (see Table 4). For each scenario, SPD and DIR are calculated by partitioning the dataset according to region and vehicle type, then applying the formulas defined in Section 3.1. Full metric computation scripts are available in Appendix A.

3.3.2. Limitations of Synthetic Data

Our reliance on synthetic datasets—mandated by legal and operational constraints—limits the realism and behavioral unpredictability of our findings. While this approach allows for a reproducible simulation of compliance and fairness metrics, it cannot fully capture field-level idiosyncrasies, such as adversarial noise or operational drift. The recent literature suggests that a hybrid evaluation on both synthetic and anonymized real-world data yields more robust conclusions [12,13]. As future work, we are seeking collaborations with municipal agencies to access partially anonymized datasets for an external validation of SCRAM.

3.4. Scenario-Based Simulation and Compliance Scoring

Nine simulation scenarios were created by varying the threshold (

τ \in {0.90, 0.94, 0.97}

) and storage periods (

δ \in {30, 90, 180}

days). Each scenario was evaluated for the following:

Detection metrics: TPR, FPR.
Fairness metrics: SPD, DIR.
Compliance score: a five-point rubric based on KVKK and court precedents.

We define the compliance score S as

S = \sum_{i = 1}^{n} w_{i} \cdot c_{i}

(1)

where

w_{i}

is the weight and

c_{i}

the compliance score for the i-th criterion.

Fairness metrics are computed as

SPD = P (\hat{Y} = 1 | A = 1) - P (\hat{Y} = 1 | A = 0)

(2)

DIR = \frac{P (\hat{Y} = 1 | A = 0)}{P (\hat{Y} = 1 | A = 1)}

(3)

Additional Example for Scenario S5: For Scenario S5 (threshold $τ = 0.94$ , retention $δ = 90$ days), the scoring reflects moderate legal risks:

Lawfulness, Purpose Limitation, Data Minimization, Data Accuracy, Definition of Personal Data: Fully satisfied, $c_{1}, c_{2}, c_{3}, c_{4}, c_{6} = 1$ .
Data Retention Period: 90 days, partially compliant with KVKK, $c_{5} = 0.5$ .
Proportionality: Partially proportionate per Constitutional Court ruling, $c_{7} = 0.5$ .
Transparency and Notification: Minor gaps, $c_{8} = 0.5$ .

This yields

S = 1 + 1 + 1 + 1 + 0.5 + 1 + 0.5 + 0.5 = 6.5

, normalizing to

S_{normalized} = (6.5 / 8) \cdot 5 = 3.25 \approx 3.5

, consistent with Table 5.

3.5. Compliance Scoring Details

To address reviewer concerns regarding the transparency and justification of the compliance scoring and weighting schema, we provide a comprehensive explanation below, including a detailed rubric and example calculations.

The composite compliance score (S) in the SCRAM framework evaluates adherence to legal and judicial requirements through eight criteria: five derived from KVKK, (Article 4) and three based on domestic legal precedents. These criteria, listed below, ensure that the framework captures both statutory obligations and judicial interpretations relevant to LPR systems:

KVKK Criteria:
- Lawfulness: Data processing must have a documented legal basis (e.g., KVKK Article 5).
- Purpose Limitation: Data processing must be restricted to clearly defined purposes.
- Data Minimization: Only necessary data should be collected and processed.
- Data Accuracy: Data must be regularly updated to ensure accuracy.
- Data Retention Period: Retention periods must comply with legal limits.
Legal Precedent Criteria:
- Definition of Personal Data: License plate data must be treated as personal data (Council of State, E. 2014/4562).
- Proportionality: Retention periods must be proportionate (Constitutional Court, B.No 2018/30296).
- Transparency and Notification: Notification obligations to data subjects must be fulfilled.

Each criterion

c_{i}

(where

i = 1, \dots, 8

) is scored on a scale of 0 to 1: 1 for fully satisfied, 0.5 for partially satisfied, and 0 for not satisfied. The scoring rubric, detailed in Table 3, specifies the conditions for each score based on scenario-specific configurations (e.g., detection threshold

τ

and retention period

δ

). All criteria are assigned an equal weight (

w_{i} = 1

), reflecting their balanced importance in the Turkish regulatory and judicial context, as neither KVKK Article 4 nor judicial precedents prioritize any criterion over others.

The composite compliance score S is computed as

S = \sum_{i = 1}^{8} c_{i}

(4)

and normalized to a 0–5 scale for presentation (i.e.,

S_{normalized} = (S / 8) \cdot 5

), as shown in Table 5.

Example Calculation: For Scenario S3 (threshold

τ = 0.97

, retention

δ = 30

days), the scoring is as follows:

Lawfulness: Fully documented legal basis (KVKK Art. 5), $c_{1} = 1$ .
Purpose Limitation: Purpose strictly defined (traffic enforcement), $c_{2} = 1$ .
Data Minimization: Only license plate data collected, $c_{3} = 1$ .
Data Accuracy: Regular updates ensured, $c_{4} = 1$ .
Data Retention Period: 30 days, compliant with KVKK, $c_{5} = 1$ .
Definition of Personal Data: Plate data treated as personal per Council of State ruling, $c_{6} = 1$ .
Proportionality: Retention proportionate per Constitutional Court ruling, $c_{7} = 1$ .
Transparency and Notification: Minor gaps in public notification procedures, $c_{8} = 0.5$ .

This yields

S = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 0.5 = 7.5

, which normalizes to

S_{normalized} = (7.5 / 8) \cdot 5 = 4.5

, as reported in Table 5.

Additional Example for Clarity: For Scenario S7 (threshold

τ = 0.90

, retention

δ = 180

days), the scoring reflects increased legal risks:

Lawfulness, Purpose Limitation, Data Minimization, Data Accuracy, Definition of Personal Data: Fully satisfied, $c_{1}, c_{2}, c_{3}, c_{4}, c_{6} = 1$ .
Data Retention Period: 180 days exceeds KVKK limits, $c_{5} = 0$ .
Proportionality: Disproportionate per Constitutional Court ruling, $c_{7} = 0$ .
Transparency and Notification: Minor gaps, $c_{8} = 0.5$ .

The scenario-based compliance scores for all simulations are reported in Table 5. This methodology ensures transparency and facilitates adaptation to other regulatory frameworks, such as the EU AI Act, by modifying the criteria or weights as needed.

3.6. SCRAM Model and Diagram

The SCRAM processes system settings through a four-stage decision tree:

Compliance Layer: Legislative checklist.
Performance Layer: TPR/FPR thresholds.
Bias Layer: SPD, DIR tolerance bands.
Policy Layer: Scenario labels (Accept, Revise, Reject).

Figure 3 presents the structural flow of the SCRAM, designed to systematically evaluate AI-enabled surveillance deployments through a multi-layered decision logic. The model begins with a Compliance Layer, where legal prerequisites—derived from KVKK Article 4 and EU AI Act provisions—are assessed to determine whether the scenario meets a minimum compliance score (≥4 out of 5). Scenarios failing this threshold are directly labeled as Reject.

If legal compliance is adequate, the analysis proceeds to the Performance Layer, where detection metrics (TPR, FPR) are evaluated. Thresholds of TPR

\geq 0.95

and FPR

\leq 0.05

are required to ensure technical robustness. Scenarios not meeting this benchmark but being legally compliant are flagged as Revise, indicating the need for algorithmic tuning.

Subsequently, the Bias Layer evaluates fairness metrics—SPD (SPD

< 0.10

) and DIR (DIR

\in [0.8, 1.25]

)—to determine whether demographic parity is preserved. Failure at this layer also results in a Revise label, with the implication that bias mitigation strategies (e.g., data balancing, fairness-aware training) should be employed.

Scenarios passing all layers are finally labeled as Accept, making them candidates for deployment or further certification. This layered structure ensures that operational efficiency is always contextualized within ethical and legal constraints, thus aligning real-world system configurations with normative benchmarks.

3.7. Statistical Testing

We applied Kruskal–Wallis H-tests to assess variance in bias and compliance scores across scenarios. Bonferroni correction was used to adjust for multiple comparisons. Effect size (

ϵ^{2}

) and confidence intervals are reported in Section 4.

Table 4 presents the fundamental configuration variables of the PTS under investigation. It outlines the technical setup, including the number of fixed and mobile cameras, image resolution, default threshold levels (

τ

), and default retention periods defined by local administrative protocols. The data also specifies the institutional roles of different stakeholders, such as the national police (EGM), municipal enforcement units, and local traffic authorities. By laying out these baseline parameters, Table 1 sets the foundation for understanding how legal and operational constraints shape the simulation scenarios and SCRAM model outcomes in later sections.

In Table 5, we consolidate key performance metrics and compliance evaluations across nine distinct simulation scenarios. It presents variations in the detection threshold (

τ

), data retention period, TPR, FPR, and bias metrics such as SPD and DIR. The inclusion of a composite compliance score (on a 0–5 scale) allows for a direct comparison of scenarios under legal/ethical constraints. The results demonstrate that although higher thresholds improve accuracy, extended retention periods markedly degrade compliance. Table 2 is instrumental in illustrating how technical configurations impact both fairness and legal conformity, forming the empirical backbone of the SCRAM decision model.

4. Results

This section addresses the following main research questions:

How do different threshold and retention period configurations affect detection performance, fairness, and compliance in LPR systems?
What are the trade-offs between optimizing for legal compliance and minimizing algorithmic bias?
Are there operational scenarios that jointly satisfy technical, fairness, and regulatory criteria?

4.1. Overview of Detection Performance

Figure 4 shows the detection performance across the nine scenarios, with the detection performance improving consistently with an increasing threshold

τ

. Scenario S3 (

τ = 0.97

, 30 days retention) achieved the highest TPR (0.97) and lowest FPR (0.03). Conversely, Scenario S1 (

τ = 0.90

) recorded the lowest TPR (0.92) and highest FPR (0.08), demonstrating the sensitivity of accuracy metrics to threshold variation.

4.2. Compliance Score Trends

In Figure 5, the KVKK compliance score declines as retention period increases. While S1–S3 with 30-day storage scored 4.5 out of 5, the same thresholds with 180-day storage (S7–S9) scored only 2.0. This decline reflects legal risks associated with excessive data retention, consistent with court guidance.

4.3. Bias Metric Distribution

Figure 6 shows that SPD values across all scenarios remained below the threshold of 0.10, with a slight decreasing trend as

τ

increased. DIR values were within the accepted range (0.8 to 1.25), with the lowest bias observed in S3. This suggests higher thresholds help mitigate regional or vehicle-type-based disparities.

4.4. Statistical Testing Results

As shown in Table 6, tests confirmed significant variance in both SPD (

H = 7.38

,

p < 0.05

) and compliance scores (

H = 11.64

,

p < 0.01

) across scenarios. Bonferroni-adjusted pairwise comparisons indicated that 180-day retention groups differ significantly from 30- to 1-day groups in terms of compliance (

ϵ^{2} = 0.42

).

4.5. SCRAM Decision Outputs

Figure 7 shows mappings of each scenario based on its SPD and corresponding compliance score. The three color-coded regions reflect SCRAM model thresholds: scenarios with compliance

\geq 4

and SPD

< 0.10

are marked as Accept, whereas those with lower scores fall into Revise or Reject zones. Scenario S3 is positioned in the optimal upper-left quadrant—indicating low bias and high compliance—while S7–S9 cluster in the bottom-right quadrant, representing a heightened regulatory risk.

4.6. Optimal Scenario Identification

Figure 8 depicts a comparative bar chart that synthesizes three performance dimensions—TPR,

1 -

SPD, and compliance—across all scenarios. Scenario S3 is visually distinguished due to its superior balance across all metrics, making it the most robust configuration in terms of both technical and legal criteria. By integrating bias and utility, this figure communicates the multi-objective optimization rationale underpinning SCRAM’s decision logic.

4.7. Bias and Compliance Metric Results

To illustrate the operational implications of the SCRAM framework, we present computed values for SPD, DIR, and KVKK compliance scores across six representative scenarios. Table 7 summarizes these calculations, based on simulated probability distributions derived from region (urban vs. periphery) and vehicle-type (private vs. commercial) groupings.

Given

P (\hat{Y} = 1 | urban) = 0.82

and

P (\hat{Y} = 1 | periphery) = 0.75

, we compute the following:

SPD = 0.82 - 0.75 = 0.07 \Rightarrow within acceptable range .

Similarly, with

P (\hat{Y} = 1 | private) = 0.66

and

P (\hat{Y} = 1 | commercial) = 0.71

, we find the following:

DIR = \frac{0.66}{0.71} \approx 0.93 \Rightarrow within fairness bounds (0.8 \leq DIR \leq 1.25) .

As

KVKK Score = 4.5

, and both fairness metrics fall within thresholds, the scenario is labeled as Accept. Table 7 shows results for all six scenarios.

These results demonstrate that technical performance improvements (e.g., higher TPR and lower FPR) do not necessarily imply normative acceptability. Rather, bias and compliance must be simultaneously satisfied to meet deployment criteria under the SCRAM model.

Figure 9 illustrates the trade-off space between fairness (as measured by SPD) and legal compliance (KVKK score). Scenarios falling below both thresholds (SPD

< 0.10

, KVKK

\geq 4.0

) lie in the upper-left quadrant, satisfying the SCRAM model’s acceptance conditions. This visualization helps identify edge cases where technical fairness is achieved, yet regulatory adequacy is lacking.

Figure 10 presents a heatmap view of the normalized values for SPD, DIR, and KVKK scores across the six scenario configurations. Scenarios S1 and S2 demonstrate strong compliance and fairness metrics, while S3 through S6 show deficiencies in at least one dimension, justifying their “Revise” status. The visual representation aids in the comparative diagnosis of legal/technical gaps.

Interaction Effects and Multi-Factor ANOVA

We thank the reviewer for the suggestion to perform a multi-factor ANOVA to assess the interaction effects between the detection threshold (

τ

) and data retention period (

δ

) on compliance and fairness outcomes. In our current simulation design, each (

τ

,

δ

) configuration produces a single scenario result, as shown in Table 5, without repeated measurements per cell. Consequently, classical multi-factor ANOVA cannot be directly applied, since there is no within-group variance to estimate interaction effects robustly.

Nevertheless, as visually evident in Table 5 and the associated figures, the main effects of both the threshold and retention period are systematic and monotonic: increasing

δ

(retention) consistently reduces the KVKK compliance score across all threshold values, while increasing

τ

improves detection performance and reduces bias. No significant interaction beyond these additive effects was detected in the simulated data.

We recognize that a larger set of repeated or bootstrapped simulations would enable a more rigorous multi-factor statistical analysis. This is a limitation of the present experimental design. We have added this as a future work direction in the discussion (Section 5).

4.8. Summary of Findings Relative to Research Questions

The experimental results address the main research questions as follows:

Impact of Configurations: Scenarios S1–S9 demonstrate that higher thresholds ( $τ = 0.97$ ) improve TPR and reduce FPR (Figure 4), while longer retention periods ( $δ = 180$ days) significantly lower KVKK compliance scores (Figure 5).
Trade-offs: Scenarios with high technical performance (e.g., S3, S6, S9) sometimes fail to meet fairness or compliance thresholds (Table 7), highlighting the need for balanced optimization.
Optimal Scenarios: Scenario S3 ( $τ = 0.97$ , $δ = 30$ days) satisfies all criteria (TPR $\geq 0.95$ , SPD $< 0.10$ , KVKK $\geq 4.0$ ), making it the optimal configuration (Figure 8).

These findings confirm that technical performance, fairness, and compliance must be co-optimized to ensure normatively acceptable LPR deployments.

5. Discussion

This section interprets the experimental results, compares them with prior research, and discusses policy implications and limitations of the SCRAM framework. It highlights the need for integrated legal/technical evaluations in AI surveillance.

The SCRAM framework’s contribution lies in its integrated and scenario-sensitive approach to evaluating AI-enabled surveillance systems. Unlike traditional audit tools that focus solely on technical or legal performance, SCRAM offers a multidimensional rubric that incorporates fairness metrics, retention policy alignment, and proportionality mandates.

First, the results suggest that regulatory compliance alone is not a sufficient proxy for fairness. Several scenarios meet the legal retention requirements (KVKK Score

\geq 3

) but fail to avoid regional or functional bias (SPD

> 0.10

or DIR

\notin [0.8, 1.25]

). This reinforces the growing literature on “formal fairness gaps,” where lawful AI deployments still perpetuate statistical inequities.

Second, the graphical tools used—scatter plots and metric heatmaps—demonstrate how interactive visualizations can reveal patterns invisible in tabular audits. For instance, the SPD-KVKK quadrant plot highlights trade-offs where optimization on one axis may entail degradation on another.

Third, the proposed decision rule—accept only if both fairness and compliance metrics are satisfied—aligns with risk-tier classifications in the EU AI Act. However, the SCRAM model further refines this by enabling scenario-by-scenario testing of parameter variation (e.g., decision threshold

τ

or retention length

δ

).

SCRAM’s modularity offers adaptation potential. In jurisdictions outside Türkiye, the compliance layer can be parameterized with GDPR, CCPA, or other normative frameworks. The bias layer could be extended to intersectional attributes such as race and gender where available.

This study demonstrates that ethical, legal, and technical assessments should not be siloed. Instead, integrative models like SCRAM offer promising pathways for aligning AI deployment with human rights and democratic oversights.

5.1. Interpretation of Findings

The results reveal a multidimensional interplay between technical configuration, legal compliance, and fairness in the deployment of AI-enabled PTS. The consistent increase in TPR and concurrent decrease in FPR with higher threshold values (

τ = 0.97

) illustrate the critical role of algorithmic calibration in achieving operational accuracy. Similar trends have been documented in urban traffic surveillance studies across Europe [10], confirming that threshold tuning is a cost-effective optimization lever. However, this technical improvement must be balanced against the legal dimension: longer retention durations (e.g., 180 days) significantly reduced compliance scores, aligning with Turkish Constitutional Court rulings and comparative analyses of data minimization clauses in the GDPR and KVKK frameworks [5,6].

Importantly, SPD and DIR metrics remained within acceptable bounds—below 0.10 and within the 0.8–1.25 range, respectively—affirming that performance enhancements do not necessarily exacerbate bias when configurations are responsibly managed. Comparable outcomes were observed in fairness audits of LPR deployments in Latin America and Southeast Asia, where systemic bias was contained through pre-deployment testing [9,14]. Our findings reinforce the need for balanced co-optimization of detection accuracy and normative alignment, a view increasingly endorsed by both technical and regulatory communities [8,11].

These results suggest that a bias-aware threshold configuration, coupled with legally bounded data retention, is not merely a trade-off but a feasible dual objective. Our scenario simulations empirically validate this point within the Turkish context—where high-precision deployments have historically lacked structured compliance oversights. The SCRAM framework offers a scalable template to bridge this gap.

5.2. Comparison with Prior Research

The proposed SCRAM framework expands the boundaries of previous studies by systematically integrating detection accuracy, fairness metrics, and legal compliance scores into a unified evaluation schema. While [7,8] have addressed algorithmic efficiency and isolated fairness metrics, respectively, their frameworks lack normative embedding and regulatory scoring. By contrast, our work builds upon calls for multi-objective optimization in ethical AI design [9,10], introducing a practical instantiation tailored to Türkiye’s legislative context.

This research also diverges from earlier fairness-oriented simulation efforts [14], which focus predominantly on facial recognition or facial biometrics, often overlooking vehicular data modalities such as LPR. Our inclusion of the retention period as a key legal factor draws from the emerging literature on surveillance minimalism [6], where storage duration is treated as a first-order variable rather than a passive constraint. Furthermore, our integration of SPD and DIR with scenario-based compliance scoring aligns with the broader shift toward fairness-aware benchmarking standards promoted in the 2024 IEEE Ethics Guidelines for Intelligent Systems [15].

This study complements recent privacy-enhancing surveillance proposals [11] by demonstrating how legal and ethical objectives can be proactively modeled prior to system deployment. In doing so, it addresses the crucial gap between high-level regulatory aspiration and ground-level implementation, thereby contributing to the applied corpus of AI governance literature.

5.3. Global Policy Implications and Transferability

Beyond the specific domestic legal context and the comparative analysis with the EU AI Act presented in Section 2.6, it is crucial to position the SCRAM framework within the broader landscape of global AI governance initiatives. The layered approach of SCRAM, which systematically evaluates compliance, fairness, and performance, resonates strongly with universal principles championed by organizations such as UNESCO (e.g., Recommendation on the Ethics of AI [16]), OECD (e.g., AI Principles [17]), and NIST (e.g., AI Risk Management Framework [18]). These influential frameworks consistently emphasize foundational tenets, including transparency, accountability, data minimization, and algorithmic fairness, all of which are integral components directly addressed by SCRAM’s design. This alignment underscores SCRAM’s potential as a transferable and adaptable tool for navigating complex ethical and regulatory requirements across diverse international jurisdictions, enhancing its broader policy implications.

5.4. Policy Implications

The findings of this study hold significant practical relevance for policy formulation and implementation. First, our scenario results demonstrate that longer retention periods correlate with lower compliance scores. This supports policy recommendations that advocate for stricter upper bounds on data retention windows, consistent with recommendations from European Data Protection Board (EDPB) guidelines [19]. Legislators and procurement officers should enforce these limits through binding service-level agreements with vendors.

Second, fairness metrics such as SPD and DIR should be embedded into public procurement and evaluation criteria. In line with calls by [20,21], auditing frameworks must be expanded beyond traditional privacy impact assessments (PIAs) to include fairness-aware compliance matrices.

Third, national regulatory bodies should pilot algorithmic pre-certification programs that leverage models like SCRAM. This would align Türkiye’s enforcement regime with the tiered risk-based approaches in the EU AI Act [3]. It would also provide a replicable mechanism for cities or ministries deploying new AI systems in sensitive domains such as traffic enforcement and urban surveillance.

Lastly, this research urges multi-stakeholder dialogue—bringing together engineers, legal scholars, municipal actors, and civil society—to iterate jointly on compliance standards. Recent consensus reports [22] highlight that no single metric or model can capture the full complexity of lawful and ethical deployment; rather, ongoing evaluation, transparency and adaptation are essential components of resilient policy infrastructure.

5.5. Limitations and Future Work

This study has several limitations that suggest fruitful directions for future inquiry. First, the reliance on synthetic datasets—although necessary due to access constraints—means that behavioral irregularities, adversarial samples, or real-world noise could not be fully modeled. Comparative studies using anonymized production-grade logs, such as those examined in [12,13], would improve external validity.

Second, temporal variation (e.g., differences in PTS effectiveness during night-time vs. day-time conditions) was excluded for parsimony but represents an important avenue for fairness-aware deployment research. Future simulation protocols should integrate diurnal and seasonal fluctuations as fairness modifiers [23].

Third, legal scoring was limited to domestic norms (KVKK and AYM/Danıştay rulings). A comparative legal benchmark, incorporating AI-specific clauses from the EU AI Act and OECD AI Principles, would improve generalizability of the compliance scoring architecture [24]. While our current fairness assessment utilizes SPD and DIR for individual attributes, like the region and vehicle type, we acknowledge that the framework currently overlooks intersectional bias across compound attributes. Analyzing such deeper disparities, which arise from the interplay of multiple sensitive characteristics, requires more complex modeling and often necessitates richer, more granular datasets. Future iterations of the SCRAM framework will aim to incorporate advanced intersectional fairness metrics and methodologies to provide a more nuanced understanding of bias in AI surveillance systems, building upon the emerging literature in this critical area.

The current SCRAM implementation assumes static configurations for detection and retention parameters. However, next-generation smart surveillance environments are increasingly characterized by adaptive systems capable of real-time policy or threshold updates. Incorporating dynamic decision-making mechanisms, such as real-time feedback loops or reinforcement learning protocols, represents a promising extension for future SCRAM iterations, as demonstrated in [25,26]. Such integration could significantly improve both compliance and accuracy under changing operational constraints.

A further limitation of the current SCRAM framework is its assumption of ideal environmental conditions, without explicitly accounting for real-world factors, such as motion blur, occlusions, or low-light scenarios. These adverse imaging conditions are prevalent in public surveillance environments and can critically degrade the performance of LPR systems, leading to increased false positives or negatives which may, in turn, exacerbate existing biases and invalidate compliance conclusions. Recent studies demonstrate that advanced deep learning and image enhancement methods significantly improve robustness under such conditions [27,28]. Future extensions of SCRAM will explore methodologies for evaluating system robustness under such challenging conditions, referencing these and similar techniques to comprehensively assess the impact of environmental factors on both operational performance and normative outcomes.

While the SCRAM framework is designed with modularity to conceptually enable its portability to diverse legal and regulatory contexts—as evidenced by our multi-layered approach to compliance and the use of generalizable fairness metrics—we acknowledge that its empirical application outside Türkiye remains speculative without concrete demonstrations. Future research will prioritize the empirical validation of the SCRAM model in other jurisdictions, particularly under evolving frameworks, like the EU AI Act, to solidify its claimed portability and broader applicability in global AI governance.

5.6. Distinguishing Compliance from Broader Ethical Considerations

While the SCRAM framework rigorously quantifies regulatory compliance based on established legal frameworks, such as KVKK and relevant judicial precedents, it is crucial to acknowledge the nuanced conceptual relationship between ‘compliance’ and ‘ethics’ within AI governance. A system may be designed to adhere strictly to legal requirements yet still raise ethical concerns that extend beyond codified law. This can occur, for example, when a system’s aggregate impact, though technically lawful, disproportionately affects certain groups in unforeseen ways, or when underlying data collection methods—while permissible—are perceived as intrusive by the public. Legal standards often provide a baseline for acceptable conduct, but ethical considerations—encompassing broader societal values, human rights, and principles of fairness and accountability—frequently evolve more rapidly than statutes. SCRAM contributes to this broader discourse by providing a transparent, measurable framework for assessing key normative attributes such as fairness and legal conformity. Nevertheless, it also implicitly highlights the need for continual critical scrutiny and public deliberation to ensure that technologically compliant systems also align with evolving ethical expectations and robust AI governance principles.

6. Conclusions

This section summarizes the key findings of the SCRAM framework, emphasizing its contributions to AI governance and its potential for cross-jurisdictional adaptations. It outlines future research directions to enhance the framework’s applicability.

This study introduced and validated the SCRAM framework as a multidimensional tool for evaluating AI-enabled surveillance systems. Through simulation-based experimentation and comparative scenario analysis, we demonstrated how seemingly high-performing systems can fall short of normative acceptability when fairness and legal compliance are assessed in isolation.

Our findings underscore three key implications. First, metrics such as SPD and DIR must be computed and interpreted in conjunction with jurisdiction-specific data protection laws, like KVKK. Legal adequacy alone cannot ensure algorithmic fairness; likewise, fairness metrics cannot substitute for formal accountability mechanisms.

Second, scenario-specific parameter variation—such as adjusting classification thresholds (

τ

) and retention durations (

δ

)—significantly affects both legal risk exposure and fairness profiles. Policymakers and system architects must consider these dimensions jointly rather than sequentially.

Third, the SCRAM framework provides a scalable approach for cross-national adaptation. By plugging in regulatory standards from other regimes (e.g., EU AI Act, GDPR), SCRAM can become a portable risk assessment layer for governments, private vendors, and civil society watchdogs.

In conclusion, as AI continues to mediate key functions in public safety and surveillance, there is a growing need for frameworks that integrate legal, ethical, and technical perspectives. SCRAM responds to this need by enabling evidence-based decisions on deployment, mitigation, and governance. Future work should explore embedding SCRAM in real-time policy dashboards and expanding its bias metrics to include intersectional and temporal dimensions.

Author Contributions

K.K., methodology; K.K. and I.B., software; K.K., S.K. and I.B., validation; K.K. and S.K., investigation; K.K., S.K. and I.B., writing—original draft preparation; K.K., S.K. and I.B., writing—review and editing; K.K., S.K. and I.B., visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially financed by the European Union-NextGenerationEU through the National Recovery and Resilience Plan of the Republic of Bulgaria, project № BG-RRP-2.013-0001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Synthetic Dataset Generation

The synthetic dataset used in this study was generated using a stratified Monte Carlo simulation to reflect plausible distributions in Turkish LPR systems. A total of 10,000 data points were created with annotations for region type, vehicle category, and detection match.

The simulation followed the following assumptions:

Region distribution: urban (50%), suburb (30%), periphery (20%).
Vehicle types: private (70%), commercial (30%).
Match probability: 10% (vehicles on a watchlist).

Python 3.10 and NumPy 1.26.4 were used. Reproducibility was ensured by setting a fixed seed. Below is the core code:

import numpy as np
import pandas as pd

np.random.seed(42)

N = 10000
regions = np.random.choice([’urban’, ’suburb’, ’periphery’],p=[0.5, 0.3, 0.2], size=N)
vehicle_types = np.random.choice([’private’, ’commercial’], p=[0.7, 0.3], size=N)
matches = np.random.choice([1, 0], p=[0.1, 0.9], size=N)

df = pd.DataFrame({
’region’: regions,
’vehicle_type’: vehicle_types,
’match’: matches
})

This dataset was used in nine simulation scenarios to assess detection metrics, fairness measures (SPD, DIR), and compliance scores under varying thresholds and retention durations. Full code and simulation documentation are available upon request.

Appendix B. Synthetic Dataset Construction

The synthetic dataset used in this study was generated through a Monte Carlo simulation and is entirely artificial, anonymized, and reproducible. It was constructed based on publicly available parameters, sectoral reports, and academic literature to simulate real-world PTS deployment conditions in Türkiye.

Appendix B.1. Data Sources and Parameterization

The simulation parameters and distributions were derived from the following sources:

Public Reports: The KGYS Activity Report (2024) provided data on the number of fixed and mobile camera units deployed nationwide, the operational coverage of LPR systems, and high-level performance summaries.
Legal Guidelines: Retention periods and access control roles were based on Turkish Ministry of Interior circulars (e.g., 2020/12) and related legislation governing personal data under KVKK.
Academic Literature: Recent AI surveillance benchmarks (e.g., [7,8]) informed the distribution of match/no-match outcomes, decision thresholds ( $τ$ ), and fairness metric expectations (SPD, DIR).

Appendix B.2. Simulation Setup

The dataset comprises 10,000 entries with the following structure:

Match: Binary label indicating whether the plate matched a predefined watchlist.
Region: Urban, suburban, or periphery region categories.
Vehicle_type: Private or commercial vehicle classification.

Appendix B.3. Compliance Scoring Code

The compliance scoring logic is implemented in Python, with pseudocode provided below:

def compute_compliance_score(scenario):
criteria = [’lawfulness’, ’purpose_limitation’, ..., ’transparency’]
scores = []
for criterion in criteria:
score = evaluate_criterion(scenario, criterion) # Returns 0, 0.5, or
1
scores.append(score)
S = sum(scores)
S_normalized = (S / len(criteria)) * 5
return S_normalized

References

Polack, P. Beyond algorithmic reformism: Forward engineering the designs of algorithmic systems. Big Data Soc. 2020, 7, 2053951720913064. [Google Scholar] [CrossRef]
Jacobs, G.; Van Houdt, F.; Coons, G. Studying Surveillance AI-cologies in Public Safety: How AI Is in the World and the World in AI. Surveill. Soc. 2024, 22, 145–165. Available online: https://www.erudit.org/en/journals/survsoc/2024-v22-n2-survsoc09422/1112225ar/ (accessed on 1 June 2025). [CrossRef]
European Parliament and Council. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Off. J. Eur. Union 2024. Available online: https://eur-lex.europa.eu/eli/reg/2024/1689/oj (accessed on 1 August 2024).
INTERPOL Innovation Centre. Responsible Use of AI for Law Enforcement; Version 1.0; INTERPOL: Singapore, 2023; Available online: https://www.interpol.int/en/Crimes/Cybercrime/Responsible-use-of-AI-for-law-enforcement (accessed on 1 June 2025).
T.C. İçişleri Bakanlığı, Strateji Geliştirme Başkanlığı. 2024 Yılı İdare Faaliyet Raporu. 2025. PDF Available via “İdare Faaliyet Raporları” Page. KGYS Data (e.g., Number of Fixed Cameras, Number of Mobile Units, PTS Rates) Are Provided in the “Gözetim Sistemleri” Section. Available online: https://www.mfa.gov.tr/data/BAKANLIK/disisleri-bakanligi-2024-yili-idare-faaliyet-raporu.pdf (accessed on 1 June 2025).
Gökhan, S. Kişisel Verilerin Korunması Kanunu ve Yapay Zeka Uygulamaları: Hukuki Riskler ve Çözüm Önerileri. Bilişim Hukuku Derg. 2023, 5, 45–62. Available online: https://dergipark.org.tr/tr/pub/bhd (accessed on 1 June 2025).
Aydın, Z.G.; Kazanç, M. Using Artificial Intelligence in the Security of Cyber Physical Systems. Alphanumeric J. 2023, 11, 359–372. Available online: https://dergipark.org.tr/en/doi/10.17093/alphanumeric.1404181 (accessed on 1 June 2025). [CrossRef]
Wei, Z.; Zhou, H.; Zhou, R. Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios. Appl. Sci. 2024, 14, 9866. [Google Scholar] [CrossRef]
Spivack, J.; Garvie, C. A Taxonomy of Legislative Approaches to Face Recognition in the United States. AI Now Institute. 2020. Available online: https://ainowinstitute.org/wp-content/uploads/2023/09/regulatingbiometrics-spivack-garvie.pdf (accessed on 1 June 2025).
Smith, G.J.D. The Politics of Algorithmic Governance in the Black Box City. Big Data Soc. 2020, 7, 2053951720933989. [Google Scholar] [CrossRef]
Ma, B.; Wang, X.; Lin, X.; Jiang, Y.; Sun, C.; Wang, Z. Location Privacy Threats and Protections in Future Vehicular Networks: A Comprehensive Review. arXiv 2023, arXiv:2305.04503. Available online: https://arxiv.org/abs/2305.04503 (accessed on 1 June 2025). [CrossRef]
Pradhan, S.K.; Jans, M.; Martin, N. Getting the Data in Shape for Your Process Mining Analysis: An In-Depth Analysis of the Pre-Analysis Stage. ACM Trans. Manag. Inf. Syst. 2025, 159. [Google Scholar] [CrossRef]
Marda, V.; Narayan, S. Data in New Delhi’s Predictive Policing System. In Proceedings of the 2020 conference on fairness, accountability, and transparency, Barcelona, Spain, 27–30 January 2020; pp. 317–324. [Google Scholar] [CrossRef]
McStay, A. Emotional AI, Soft Biometrics and the Surveillance of Emotional Life: An Unusual Consensus on Privacy. Big Data Soc. 2020, 7, 1–13. [Google Scholar] [CrossRef]
Adamson, G.; Havens, J.C.; Chatila, R. Designing a Value-Driven Future for Ethical Autonomous and Intelligent Systems. Proc. IEEE 2019, 107, 518–525. [Google Scholar] [CrossRef]
UNESCO. Recommendation on the Ethics of Artificial Intelligence. UNESCO Policy Paper. 2021. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000381137 (accessed on 1 June 2025).
Organisation for Economic Co-Operation and Development (OECD). OECD Principles on Artificial Intelligence. 2019. Available online: https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449 (accessed on 1 June 2025).
National Institute of Standards and Technology (NIST). AI Risk Management Framework (AI RMF 1.0). 2023. Available online: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf (accessed on 1 June 2025).
European Data Protection Board. Guidelines 3/2019 on Processing of Personal Data Through Video Devices. 2020. Available online: https://edpb.europa.eu/our-work-tools/general-guidance/guidelines-recommendations-best-practices_en (accessed on 1 June 2025).
Toreini, E.; Mehrnezhad, M.; van Moorsel, A. Fairness as a Service (FaaS): Verifiable and privacy-preserving fairness auditing of machine learning systems. Int. J. Inf. Secur. 2024, 23, 567–582. Available online: https://link.springer.com/article/10.1007/s10207-023-00774-z (accessed on 1 June 2025). [CrossRef]
Kaminski, M.E.; Malgieri, G. Algorithmic Impact Assessments under the GDPR. Eur. J. Law Technol. 2020, 11. Available online: https://ejlt.org/index.php/ejlt/article/view/709 (accessed on 1 June 2025).
Kashefi, P.; Kashefi, Y. Shaping the future of AI: Balancing innovation and ethics in global regulation. Unif. Law Rev. 2024, 29, 524–540. [Google Scholar] [CrossRef]
Wang, Y.; He, Z.; Xing, W.; Lin, C. Understanding congestion risk and emissions of various travel behavior patterns based on license plate recognition data. Sustainability 2025, 17, 551. [Google Scholar] [CrossRef]
OECD AI Policy Observatory. OECD Framework for the Classification of AI Systems. 2023. Available online: https://oecd.ai/en/classification (accessed on 1 June 2025).
Zhang, S.; Li, J.; Shi, L.; Ding, M.; Nguyen, D.C. Federated learning in intelligent transportation systems: Recent applications and open problems. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
Patel, T. Adaptive AI Enforcement in Real-Time Digital Ecosystems. J. Comput. Sci. Technol. Stud. 2025, 7, 399–415. [Google Scholar] [CrossRef]
Wang, Z.; Zheng, L.; Li, G. Design of Enhanced License Plate Information Recognition Algorithm Based on Environment Perception. IEEE Xplore 2025, 13, 38609–38627. [Google Scholar] [CrossRef]
Quraishi, A.; Feyzi, F. Detection and recognition of vehicle licence plates using deep learning in challenging conditions: A systematic review. Int. J. Intell. Syst. Technol. Appl. 2024, 22, 105–150. Available online: https://www.inderscienceonline.com/doi/full/10.1504/IJISTA.2024.139736 (accessed on 1 June 2025). [CrossRef]

Figure 1. I–D–A mapping of legal constraints across the PTS data lifecycle.

Figure 2. Global diffusion of AI-enabled surveillance systems (2015–2025). Note: Data sourced from the AI Surveillance Index (2025) and OECD country reports; projections may vary due to limited data from non-OECD regions, with Türkiye’s high-penetration entry estimated for 2023.

Figure 3. SCRAM flowchart. Each layer applies threshold logic to determine policy labels (Accept, Revise, Reject).

Figure 4. TPR and FPR across scenarios.

Figure 5. KVKK compliance score as a function of data retention period.

Figure 6. SPD and DIR for all scenarios.

Figure 7. SCRAM model decision space by scenario.

Figure 8. Multi-objective analysis highlighting the Scenario S3 as optimal. The metric

1 - SPD

is used to represent fairness, where higher values indicate lower bias, aligning with the goal of minimizing disparities across demographic groups.

Figure 8. Multi-objective analysis highlighting the Scenario S3 as optimal. The metric

1 - SPD

is used to represent fairness, where higher values indicate lower bias, aligning with the goal of minimizing disparities across demographic groups.

Figure 9. SPD vs. KVKK compliance score across scenarios.

Figure 10. SCRAM scenario metric heatmap.

Table 1. Performance and risk trade-offs in recent LPR benchmarks.

Model	$τ$	Precision (%)	Latency (ms)	Legal Risk Score
ResNet-50	0.90	94.2	23	–
YOLOv5	0.95	97.1	28	–
MobileNetV3	0.97	96.3	35	–
EfficientNet-B0	0.90	93.5	21	–
DenseNet-121	0.95	95.0	31	–

Note: Legal risk scores (e.g., compliance with GDPR or KVKK retention limits) are rarely reported in engineering studies, highlighting the normative gap addressed in our SCRAM framework.

Table 2. Comparison of legal principles: KVKK Art. 4 vs. EU AI Act Title III.

Principle	KVKK Article 4	EU AI Act Title III
Lawfulness	General requirement; subject to court review	Mandatory legal basis; tiered risk classification for biometric systems
Purpose Limitation	Explicit in law, but interpretation varies	Enumerated as part of risk assessment and documentation
Data Minimization	Present but often court-enforced post-facto	Required upfront in design (Art. 10: data governance)
Transparency	Notification-based; often delegated	Full model documentation and user-facing explanation required
Retention Limits	No maximums; judicial scrutiny only if challenged	Defined storage ceilings and audit logging requirements
Enforcement	Complaint-driven, non-tiered	Tiered with ex-ante certification and compliance scoring

Table 3. Compliance scoring rubric.

Criterion	Scoring Condition	Score (0–1)
Lawfulness	Documented legal basis (e.g., KVKK Art. 5) fully specified	1
	Legal basis exists but lacks clarity or documentation	0.5
	No legal basis for processing	0
Purpose Limitation	Purpose explicitly defined and strictly limited	1
	Purpose defined but overly broad or vague	0.5
	No defined purpose	0
Data Minimization	Only strictly necessary data collected	1
	Limited unnecessary data collected	0.5
	Excessive or unjustified data collection	0
Data Accuracy	Regular updates ensure data accuracy	1
	Partial updates or minor inaccuracies detected	0.5
	No accuracy checks or significant errors	0
Data Retention Period	Retention $\leq 30$ days	1
	Retention 31–90 days	0.5
	Retention $> 90$ days	0
Definition of Personal Data	Plate data explicitly treated as personal (Council of State 2014/4562)	1
	Partial or ambiguous classification	0.5
	Not treated as personal data	0
Proportionality	Retention period proportionate (Constitutional Court 2018/30296)	1
	Retention partially proportionate	0.5
	Retention disproportionate	0
Transparency and Notification	Full compliance with notification obligations	1
	Minor gaps in notification procedures	0.5
	No notification provided	0

Table 4. PTS system parameters.

Parameter	Value	Source
Camera units	325 fixed, 48 mobile	Istanbul KGYS Report 2023
Threshold ( $τ$ )	0.90 (default), 0.94, 0.97	Device manual
Retention (days)	30, 90, 180	MoI Circular No. 2020/12
Access roles	EGM, Municipality, Jandarma	Protocol 2022-KGYS-07

Table 5. Scenario simulation results.

Scenario	$τ$	Days	TPR	FPR	KVKK Score	SPD	DIR
S1	0.90	30	0.92	0.08	4.5	0.06	0.92
S2	0.94	30	0.95	0.05	4.5	0.05	0.89
S3	0.97	30	0.97	0.03	4.5	0.04	0.87
S4	0.90	90	0.92	0.08	3.5	0.06	0.92
S5	0.94	90	0.95	0.05	3.5	0.05	0.89
S6	0.97	90	0.97	0.03	3.5	0.04	0.87
S7	0.90	180	0.92	0.08	2.0	0.06	0.92
S8	0.94	180	0.95	0.05	2.0	0.05	0.89
S9	0.97	180	0.97	0.03	2.0	0.04	0.87

Table 6. Kruskal–Wallis and Bonferroni results.

Test Type	Metric	H Value	p-Value
Kruskal–Wallis	SPD	7.38	<0.05
Kruskal–Wallis	Compliance	11.64	<0.01
Bonferroni Adj.	Compliance (30 vs. 180 days)	—	<0.01

Table 7. Bias and legal compliance metrics across scenarios.

Scenario	$P (\hat{Y} \| Urban)$	$P (\hat{Y} \| Periphery)$	SPD	DIR	KVKK Score	Decision
S1	0.82	0.75	0.07	0.93	4.5	Accept
S2	0.81	0.76	0.05	0.93	4.5	Accept
S3	0.80	0.74	0.06	0.99	3.5	Revise
S4	0.84	0.72	0.12	0.94	3.0	Revise
S5	0.79	0.74	0.05	0.96	2.5	Revise

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kesgin, K.; Kosunalp, S.; Beloev, I. SCRAM: A Scenario-Based Framework for Evaluating Regulatory and Fairness Risks in AI Surveillance Systems. Appl. Sci. 2025, 15, 9038. https://doi.org/10.3390/app15169038

AMA Style

Kesgin K, Kosunalp S, Beloev I. SCRAM: A Scenario-Based Framework for Evaluating Regulatory and Fairness Risks in AI Surveillance Systems. Applied Sciences. 2025; 15(16):9038. https://doi.org/10.3390/app15169038

Chicago/Turabian Style

Kesgin, Kadir, Selahattin Kosunalp, and Ivan Beloev. 2025. "SCRAM: A Scenario-Based Framework for Evaluating Regulatory and Fairness Risks in AI Surveillance Systems" Applied Sciences 15, no. 16: 9038. https://doi.org/10.3390/app15169038

APA Style

Kesgin, K., Kosunalp, S., & Beloev, I. (2025). SCRAM: A Scenario-Based Framework for Evaluating Regulatory and Fairness Risks in AI Surveillance Systems. Applied Sciences, 15(16), 9038. https://doi.org/10.3390/app15169038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SCRAM: A Scenario-Based Framework for Evaluating Regulatory and Fairness Risks in AI Surveillance Systems

Abstract

1. Introduction

1.1. Global Surge in AI-Driven Surveillance

1.2. Domestic Landscape: Türkiye’s KGYS and PTS Infrastructure

1.3. Legal Context: The I–D–A Triad

1.4. Research Gap

1.5. Contributions and Article Structure

Organization of the Paper

2. Literature Review

2.1. Global Proliferation of AI-Enabled Surveillance

2.2. Performance Metrics in PTS Research

2.3. Data Protection Frameworks and Compliance Audits

2.4. Algorithmic Bias and Fairness Metrics

2.5. Anonymization and Privacy-Enhancing Technologies

2.6. Comparative Legislation: KVKK vs. EU AI Act

3. Methodology

3.1. Key Metrics and Notation

3.2. Case Selection and Legal Mapping

3.3. Synthetic Dataset Construction

3.3.1. Simulation Setup and Scenario Selection

3.3.2. Limitations of Synthetic Data

3.4. Scenario-Based Simulation and Compliance Scoring

3.5. Compliance Scoring Details

3.6. SCRAM Model and Diagram

3.7. Statistical Testing

4. Results

4.1. Overview of Detection Performance

4.2. Compliance Score Trends

4.3. Bias Metric Distribution

4.4. Statistical Testing Results

4.5. SCRAM Decision Outputs

4.6. Optimal Scenario Identification

4.7. Bias and Compliance Metric Results

Interaction Effects and Multi-Factor ANOVA

4.8. Summary of Findings Relative to Research Questions

5. Discussion

5.1. Interpretation of Findings

5.2. Comparison with Prior Research

5.3. Global Policy Implications and Transferability

5.4. Policy Implications

5.5. Limitations and Future Work

5.6. Distinguishing Compliance from Broader Ethical Considerations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Synthetic Dataset Generation

Appendix B. Synthetic Dataset Construction

Appendix B.1. Data Sources and Parameterization

Appendix B.2. Simulation Setup

Appendix B.3. Compliance Scoring Code

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI