A Swin-Transformer-Based Network for Adaptive Backlight Optimization

Li, Jin; Pu, Rui; Jiang, Junbang; Zhu, Man

doi:10.3390/sym18030502

Open AccessArticle

A Swin-Transformer-Based Network for Adaptive Backlight Optimization

¹

School of Science, Hubei University of Technology, Wuhan 430068, China

²

Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2026, 18(3), 502; https://doi.org/10.3390/sym18030502

Submission received: 23 January 2026 / Revised: 8 March 2026 / Accepted: 13 March 2026 / Published: 15 March 2026

(This article belongs to the Special Issue Symmetry and Asymmetry in Optimization Algorithms and Control Systems)

Download

Browse Figures

Versions Notes

Abstract

Mini-LED local dimming systems commonly suffer from luminance discontinuity, halo artifacts, and temporal instability in dynamic scenes. Traditional heuristic-based methods and standard convolutional neural networks often fail to capture long-range spatial dependencies and struggle to balance spatial smoothness, content fidelity, and real-time performance under hardware constraints. To address these challenges, this paper proposes SwinLightNet, an efficient adaptive backlight optimization network tailored for Mini-LED displays. Built upon a Swin Transformer framework tailored for Mini-LED backlight optimization, SwinLightNet integrates five hardware-aware design strategies: (i) a lightweight Swin variant (window size = 8, MLP ratio = 2.0) for efficient global context modeling; (ii) CNN encoder–decoder integration for multi-scale feature extraction; (iii) a partition-level alignment module ensuring spatial consistency; (iv) a backlight constraint module enforcing local luminance consistency and contrast preservation; (v) a change-aware temporal decision framework stabilizing dynamic sequences. These components synergistically resolve core limitations: global modeling suppresses halo artifacts while preserving content fidelity; alignment and constraint modules eliminate luminance discontinuity without compromising contrast; and the temporal framework guarantees flicker-free output under motion. Evaluated on DIV2K (static images) and a custom 2K-resolution video dataset (dynamic scenes), SwinLightNet demonstrates robust reconstruction quality while maintaining only 1.18 million parameters and 0.088 GFLOPs (Computational Cost). The results confirm SwinLightNet’s effectiveness in holistically addressing spatial, temporal, and hardware constraints, demonstrating strong potential for practical deployment in resource-constrained Mini-LED backlight control systems.

Keywords:

mini-LED; adaptive local dimming; partitioned backlight; deep learning

1. Introduction

In recent years, Mini-LED display technology has emerged as a leading solution for high-dynamic-range (HDR) imaging, offering superior contrast ratios and peak brightness compared to conventional liquid crystal display (LCD) systems [1,2,3]. At the heart of Mini-LED displays lies the local dimming algorithm, which dynamically adjusts backlight intensity across hundreds or thousands of dimming partitions to match image content. However, achieving optimal backlight distribution remains a challenging task due to the inherent trade-off among luminance accuracy, structural fidelity, and temporal stability [4,5,6].

Existing local dimming algorithms can be broadly categorized into three classes, each with distinct limitations. First, heuristic rule-based methods, such as maximum luminance [7] and mean luminance [8] algorithms, rely on hand-crafted rules to determine backlight values. While computationally efficient, these methods frequently suffer from severe halo artifacts and luminance discontinuity at partition boundaries, as they lack content-aware adaptation capabilities [9]. Second, CNN-based deep learning methods have been introduced to model the nonlinear mapping between image content and backlight patterns [10,11,12]. Although convolutional neural networks (CNNs) improve local feature extraction, their limited receptive fields prevent effective modeling of long-range dependencies across dimming partitions, resulting in suboptimal global luminance distribution and visible block effects in large uniform regions [13]. Third, reinforcement learning and IoT-based scheduling approaches have been explored for display power management [14,15]. Notable advances include Han et al.’s DDQN-based decomposed graph scheduling for flexible policy learning and real-time responsiveness [16], as well as Salh et al.’s GAN-integrated deep distributed Q-networks for IoT scheduling with improved training stability [17,18]. These works provide valuable technical foundations for hardware deployment, real-time inference, and system-level optimization. However, they primarily target energy efficiency and resource scheduling rather than pixel-level luminance quality, making them less suitable for perceptual-driven backlight control tasks where visual fidelity is the primary objective.

The critical challenge in local dimming is ensuring spatial consistency across adjacent dimming partitions while maintaining temporal coherence across video frames. Long-range dependencies are essential for two reasons. Spatially, light diffusion in Mini-LED panels causes each partition’s luminance to affect neighboring regions; ignoring these correlations leads to visible seams and halo artifacts. Temporally, frame-to-frame backlight fluctuations cause perceptible flickering, especially in dark scenes; modeling temporal dependencies ensures smooth transitions and enhances visual comfort. Standard CNN architectures fail to capture these long-range interactions due to their local convolution operations. This motivates our adoption of the Swin Transformer [19], whose shifted-window self-attention mechanism enables efficient global modeling while maintaining computational feasibility.

In this paper, SwinLightNet is proposed as a novel adaptive backlight optimization network specifically designed for Mini-LED local dimming systems. The main contributions are threefold. First, in terms of task-specific system design for Mini-LED backlight optimization, SwinLightNet incorporates a hierarchical spatio-temporal processing pipeline that explicitly models both spatial luminance correlations and temporal frame-to-frame consistency, with a differentiable temporal constraint module introduced to suppress flickering without sacrificing responsiveness. Second, in terms of task-specific loss design, we propose a multi-component loss function combining L1 pixel loss (λ₁ = 1.0), SSIM structural loss (λ₂ = 0.5), and smoothness regularization (λ₃ = 0.2). This formulation is strategically optimized for backlight control: the L1 term ensures precise luminance prediction at partition boundaries; SSIM preserves structural integrity of bright/dark transitions critical for visual quality; and smoothness regularization suppresses spatial artifacts while maintaining edge sharpness—explicitly resolving the tension between content fidelity, spatial smoothness, and hardware-driven constraints. Third, through comprehensive validation, extensive experiments on the DIV2K dataset and a custom 2K video sequence demonstrate that SwinLightNet outperforms state-of-the-art methods by 2.4 dB in PSNR and 0.035 in SSIM, while maintaining only 1.18 M parameters and 0.083 GFLOPs. These results, corroborated by ablation studies isolating each module’s contribution, provide conclusive evidence that the proposed task-specific system designs and loss design collectively resolve the core challenges of Mini-LED local dimming—luminance discontinuity, halo artifacts, and temporal instability—without compromising real-time deployment feasibility. The remainder of this paper is organized as follows: Section 2 describes the proposed method, Section 3 presents experimental results, and Section 4 concludes the work.

2. Adaptive Spatial-Contrast-Enhanced Local Dimming Method Based on Mini-LED Technology

2.1. Overall System Design

To achieve fine-grained backlight modulation for complex image content, this work develops an adaptive backlight optimization network, termed SwinLightNet, based on the Swin Transformer architecture. Driven by local luminance characteristics, the proposed network learns region-level backlight response relationships through multi-scale spatial modeling, enabling a unified formulation and performance enhancement over conventional statistical backlight strategies.

During the spatial modeling stage, the input image is mapped to a partition-level luminance representation, from which an initial backlight estimate is generated via a luminance-aware mapping. This estimation captures the energy distribution of dark and bright regions while incorporating an adaptive luminance adjustment mechanism to provide differentiated responses across varying intensity ranges, thereby enhancing image details without excessive amplification. Subsequently, a local neighborhood constraint is applied to refine the backlight estimation, enforcing spatial consistency and reducing luminance discontinuities at partition boundaries.

To address flicker artifacts caused by temporal variations in backlight in video sequences, SwinLightNet introduces an adaptive temporal fusion strategy based on inter-frame luminance statistics. This strategy jointly characterizes luminance variation trends between adjacent frames and employs dynamic regulation to constrain extreme changes, thereby improving temporal smoothness while preserving content responsiveness.

At the output stage, the optimized backlight distribution is mapped to pixel-level space to construct a continuous full-resolution backlight layer, which is then combined with the original image through luminance-guided fusion and compensation. Through this collaborative optimization process, SwinLightNet consistently produces high-quality backlight distributions across diverse content scenarios, significantly enhancing image contrast and overall visual consistency.

Figure 1 illustrates the overall workflow of the SwinLightNet algorithm. By integrating multiple optimization strategies during backlight computation, the proposed framework preserves image details, improves display quality, enhances resource utilization efficiency, and reduces hardware power consumption.

2.2. Backlight Extraction

In LED display systems, accurate estimation of backlight distribution directly determines image luminance hierarchy and local contrast performance, making backlight modeling a critical component of the display optimization pipeline. As the foundational module of the SwinLightNet framework, the backlight extraction module aims to construct a spatially adaptive backlight representation from the input image, providing reliable input for subsequent constraint enforcement and temporal optimization.

This module performs luminance modeling on a block-level backlight control grid constrained by the physical partition structure of Mini-LED displays. To better align with the high sensitivity of the human visual system to luminance variations, the input RGB image is first converted into the YCrCb color space, and only the luminance component is retained for backlight computation, thereby reducing the influence of chromatic information on backlight estimation.

Subsequently, the luminance channel is divided into

r_{b} \times c_{b}

non-overlapping subregions according to a predefined partition scheme, with each region corresponding to a physical LED backlight unit. Given an input image resolution of

H \times W

, the spatial dimensions of each backlight partition are defined as

h_{b} = ⌊\frac{H}{r b}⌋, w_{b} = ⌊\frac{W}{c b}⌋

(1)

Here,

r_{b}

and

c_{b}

denote the numbers of backlight partitions along the vertical and horizontal directions, respectively. Through this mapping, pixel-level luminance information is transformed into block-level representations consistent with the hardware structure, laying the foundation for constructing high-fidelity and spatially consistent backlight distributions.

2.2.1. Network Architecture

The system design of SwinLightNet is driven by three key considerations specific to Mini-LED local dimming tasks. First, regarding input efficiency, unlike general image processing tasks that operate on full-resolution RGB images such as 2040 × 1356, backlight control works on low-resolution luminance partition maps such as 40 × 64 dimming partitions, which motivates the lightweight CNN encoder design that processes downsampled grayscale luminance maps rather than full-color images, reducing computational overhead by approximately 75 percent compared to RGB-based approaches. Second, for global modeling, standard CNN-based local dimming methods fail to model long-range dependencies due to their limited receptive fields, resulting in visible block effects and luminance discontinuity across adjacent dimming partitions, whereas the Swin Transformer shifted-window self-attention mechanism enables each dimming partition to attend to all other partitions through hierarchical window partitioning, capturing global luminance correlations. The hybrid CNN–Swin design places the Swin module at the bottleneck layer, preserving local feature extraction capability from CNN layers while adding global context modeling where it matters most. Third, to ensure real-time deployment, Mini-LED display systems require inference speeds above 60 FPS to match standard refresh rates, so two key optimizations are introduced. The first optimization reduces the MLP expansion ratio from the standard 4.0 to 2.0, cutting parameters by 50 percent with minimal performance loss. The second optimization uses a window size of 8 with cyclic shift instead of the standard window size of 7. This choice is made because a window size of 8 aligns with the hardware-friendly multiple of 2—unlike the standard size of 7 which causes alignment issues on low-resolution feature maps—and covers the entire spatial dimension of the low-resolution partition maps, ensuring global context modeling. These design choices enable SwinLightNet to achieve high FPS on embedded GPU platforms while maintaining state-of-the-art visual quality.

Based on the above design rationale, this paper proposes an end-to-end network for predicting brightness adjustment factors that combines convolutional feature extraction with the global modeling capabilities of the Swin Transformer. The technique seeks to produce high-resolution factor distributions for backlight modification through upsampling and precisely restore spatially continuous and structurally consistent brightness dimming coefficient maps from low-resolution inputs. As shown in Figure 2, the complete network consists of three parts, namely an encoder, a Swin Transformer feature augmentation module, and a decoder.

The detailed architecture of the proposed network is summarized in Table 1. The encoder comprises three subsampling convolutional layers, each employing a 4 × 4 convolution with a stride of 2 and padding of 1. This configuration gradually reduces the input feature resolution to H/2, H/4, and H/8 while increasing the number of channels to 32, 64, and 128, respectively, to extract multi-scale texture and structural features. All convolutional layers utilize the ReLU activation function to enhance nonlinear modeling capability. Subsequently, a Bottleneck module employs two consecutive 3 × 3 convolutions to further refine the latent representation.

To overcome the limitations of convolutions in representing global illumination trends and long-range dependencies, a two-layer Swin Transformer module is integrated between the encoder and decoder. Each Swin Block consists of LayerNorm, windowed multi-head self-attention (W-MSA/SW-MSA), residual connections, and an MLP (Linear → GELU → Linear). By alternating between regular and shifted windows, the network computes effective attention within local windows while enhancing global modeling capabilities through cross-window correlations. This process yields an enhanced feature representation

P_s

with superior edge preservation and global consistency after processing through two Swin Block levels.

Finally, the decoder gradually restores spatial resolution through three deconvolution modules, reducing the number of feature channels from 128 to 64, then to 32, and finally to 1. A Sigmoid activation function is applied to generate a brightness adjustment factor map scaled to the [0, 1] range. This map is further upscaled to full resolution using bilinear interpolation to create the final backlight adjustment layer.

The integration of the pre-trained Swin Transformer module addresses three critical limitations of existing local dimming methods. First, unlike standard CNNs that suffer from limited receptive fields and often produce block artifacts in uniform regions, the shifted-window self-attention mechanism enables global luminance correlation modeling across all dimming partitions. Second, compared to pure Transformer architectures with quadratic computational complexity, the window-based linear attention of Swin Transformer, combined with our bottleneck placement strategy, Specifically, our design adapts the standard Swin Transformer to the low-resolution partition maps by optimizing the window size and MLP ratio, reducing computational cost by approximately 80 percent while maintaining performance, ensuring real-time feasibility. Third, whereas existing methods lack explicit temporal modeling leading to flickering, our spatio-temporal extension enforces frame-to-frame consistency through adaptive temporal constraints.

2.2.2. Loss Function

In this study, a multi-loss joint optimization framework is developed to guarantee that the projected luminance factor map achieves high accuracy, strong structural consistency, and spatial smoothness. The total loss function consists of three components: an L1 pixel loss term to ensure luminance mapping accuracy, a structural similarity loss term to improve the structural preservation capability of predictions in edge and texture regions, and a smoothness regularization term to enforce spatial continuity across adjacent dimming partitions.

L_{S S I M} = 1 - S S I M (X_{d}, X_{g t})

(2)

is introduced. First, the L1 loss term

L_{L 1} = ∥ X_{d} - X_{g t} ∥_{1}

limits the pixel-level error between the predicted value

X_{d}

and the ground truth

X_{g t}

, guaranteeing overall luminance consistency. Lastly, a gradient-based smoothing constraint is described as follows:

L_{s m o o t h} = ∥ ▽_{x} X_{d} ∥_{1} + ∥ ▽_{y} X_{d} ∥_{1}

(3)

is used to suppress noise and lessen local oscillations, a gradient-based smoothing constraint is introduced to enforce spatial continuity of the brightness factor map. After incorporating these elements, the overall loss function is described as

L = λ_{1} L_{L 1} + λ_{2} L_{S S I M} + λ_{3} L_{s m o o t h}

(4)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

are the weight parameters. This seeks to achieve more accurate and aesthetically pleasing brightness control effects by balancing the impact of various loss terms on the final model performance. λ₁ was set to 1.0 for the L1 pixel loss to ensure luminance mapping accuracy; λ₂ was assigned 0.5 for the SSIM loss to preserve structural information; and λ₃ was configured as 0.2 for the smoothness loss to maintain spatial continuity without compromising contrast. Early stopping with patience = 25 epochs was applied based on validation PSNR plateau, but we trained for a fixed maximum of 200 epochs to ensure full convergence.

2.3. Backlight Constraint

The backlight constraint module is positioned between backlight estimation and temporal optimization within the SwinLightNet framework. Its primary purpose is to structurally regulate the initial backlight distribution without compromising content adaptivity, thereby enhancing spatial continuity and luminance stability. Unlike approaches that directly limit backlight amplitudes, this module constrains the relative relationships among backlight partitions from the perspectives of local consistency and contrast preservation, alleviating block artifacts and abrupt luminance transitions commonly observed in local dimming systems.

The module operates on an

r_{b} \times c_{b}

backlight partition grid and consists of two consecutive stages: local luminance consistency enforcement and contrast-preserving adaptive adjustment. The output is a constrained backlight grid that serves as a stable input for subsequent temporal fusion.

2.3.1. Local Luminance Consistency Constraint

After initial backlight estimation, discontinuous luminance variations may arise between adjacent partitions due to content differences or estimation errors. To address this issue, a neighborhood-based local consistency constraint is introduced to structurally refine the luminance of each backlight partition.

Let

B_{i, j}^{r a w}

denote the initial backlight estimate. The local reference luminance is defined as a weighted average of neighboring partitions:

{\overset{ˉ}{B}}_{i, j} = \sum_{(m, n) \in N_{i, j}} w_{m, n} B_{m, n}^{r a w}

(5)

where

N_{i, j}

represents the neighborhood set of partition

(i, j)

, and the weights

w_{m, n}

are normalized according to spatial distance. Based on this reference, the luminance of the current partition is adjusted as

B_{i, j}^{(0)} = B_{i, j}^{r a w} + α ({\bar{B}}_{i, j}− B_{i, j}^{r a w})

(6)

where

α \in [0, 1]

is a tuning parameter that balances the degree of local smoothing and the original luminance response. By reducing abnormal inter-partition differences, this constraint mitigates luminance discontinuities at block boundaries while preserving the overall luminance structure.

2.3.2. Contrast-Preserving Adaptive Adjustment

Following local consistency refinement, a contrast-preserving adaptive adjustment mechanism is applied to further enhance the stability of the backlight distribution across varying luminance conditions. Instead of imposing hard clipping, this mechanism suppresses the influence of extreme values through a luminance compression function.

Let

B_{m i n}

and

B_{m a x}

denote the global minimum and maximum values of the current backlight grid, respectively. The constrained luminance of each partition is obtained by

B_{i, j}^{c o n} = B_{m i n} + (B_{m a x} - B_{m i n}) \cdot \frac{B_{i, j}^{(0)} - B_{m i n}}{B_{m a x} - B_{m i n} + ε}

(7)

where

ε

is a stabilization term introduced to prevent numerical amplification under extreme conditions. This mapping compresses the dynamic range while preserving relative contrast relationships among partitions, maintaining luminance hierarchy between bright and dark regions.

Through the two-stage constraint process, the resulting backlight distribution exhibits improved spatial continuity and smoother luminance transitions, suppressing flicker and halo artifacts in local dimming. The proposed constraint strategy is entirely based on software-level statistical and structural modeling, facilitating parameter tuning and scene adaptivity, and providing robust support for the stable performance of SwinLightNet in video display applications.

2.4. Optimal Backlight Decision

The optimal backlight decision module is designed to perform temporal coordination and optimization of backlight distributions in video sequences. Its objective is to suppress flicker and abrupt luminance transitions caused by inter-frame variations while preserving responsiveness to image content. Operating on block-level backlight representations, the module models the variation trends of backlight distributions across consecutive frames and applies adaptive temporal correction to the current backlight estimation, thereby generating a temporally continuous and stable final backlight output.

The module processes backlight data on an

r_{b} \times c_{b}

partition grid, and the resulting backlight distribution is directly forwarded to the pixel compensation stage, making it particularly suitable for display optimization in dynamic scenes.

2.4.1. Change-Aware Inter-Frame Backlight Modeling

During temporal decision making, the backlight estimation of the current frame serves as the primary information source. Let

B_{t}^{c u r}

denote the spatially refined backlight estimation of the current frame, and

B_{t - 1}^{f i n a l}

denote the final backlight result of the previous frame. The difference between these two reflects the intensity of scene luminance variation. Accordingly, a block-level difference-based variation metric is introduced:

D_{t} = \frac{1}{r_{b} c_{b}} \sum_{i, j} ∣ B_{t}^{c u r} (i, j) - B_{t - 1}^{f i n a l} (i, j) ∣

(8)

where

D_{t}

characterizes the overall magnitude of backlight change between adjacent frames. When no previous frame is available,

D_{t}

is set to zero by default, indicating that no temporal constraint is applied.

2.4.2. Adaptive Incremental Temporal Adjustment

To prevent excessive reliance on historical information under rapid scene changes, a dynamic constraint mechanism is introduced to regulate temporal fusion. Let

g_{t}

denote a dynamic upper-bound function, and the final temporal fusion weight is defined as

ω_{t} = m i n (s_{t}, g_{t})

(9)

where

ω_{t}

is further constrained within the interval

[ω_{m i n}, ω_{m a x}]

to balance temporal stability and responsiveness.

Based on this weight, temporal modulation is applied to the current backlight estimation, yielding the fused backlight distribution:

B_{t}^{f} = ω_{t} \cdot B_{t}

(10)

This fusion strategy preserves content adaptivity of the current frame while introducing implicit temporal continuity, resulting in smoother and more natural backlight transitions, particularly in video sequence processing (Figure 3).

2.4.3. Output Normalization and Module Analysis

To ensure numerical stability and reliability, a unified normalization process is applied to the final backlight result:

B_{t}^{o u t} = N (B_{t}^{f i n a l})

(11)

where

N (\cdot)

denotes normalization and range-mapping operations that constrain backlight intensities to a valid and stable range.

After processing this module, the output backlight grid exhibits strong temporal smoothness while maintaining the ability to rapidly adapt to scene luminance variations. By integrating change-aware modeling, incremental temporal updating, and numerical normalization into a unified decision pipeline, the proposed module mitigates flicker and luminance discontinuities in video display, providing critical support for achieving high visual quality with SwinLightNet under complex dynamic content.

3. Simulation Design

In the simulation experiments, the block-level backlight results generated by the backlight optimization algorithm are first subjected to spatial smoothing to construct a continuous backlight distribution. Subsequently, a pixel-level compensation mechanism is applied to perform luminance fusion and correction on the smoothed backlight data, alleviating luminance discontinuities and color imbalance caused by block boundaries, and ensuring visual consistency and naturalness in the final output images. During evaluation, multiple objective image quality metrics are employed to quantitatively analyze the simulation results of different methods, enabling a comprehensive assessment of their performance in luminance reconstruction accuracy and perceptual quality.

3.1. Pixel Compensation

The pixel compensation module serves as the final display mapping stage of the SwinLightNet framework. Its primary function is to convert block-level optimized backlight results into continuous pixel-level modulation signals and to perform luminance-consistent adjustment with the original image, thereby enhancing local contrast and overall dynamic range. Designed with perceptual consistency as the objective, this module generates the final output images suitable for Mini-LED display systems through multi-scale propagation, luminance reconstruction, and constraint-aware fusion.

Since the output of the backlight decision module is represented as a low-resolution backlight grid, direct upsampling may introduce perceptible discontinuities at block boundaries. To address this issue, instead of employing simple interpolation, a continuous propagation-based pixel-domain backlight reconstruction strategy is adopted.

Let

B_{t}^{f i n a l} (i, j)

denote the final backlight grid. It is first mapped to a sparse guidance representation, and a continuous backlight distribution is then generated in the pixel domain via distance-weighted propagation:

B^{c o n t} (x, y) = \frac{\sum_{i, j} B_{t}^{f i n a l} (i, j) w_{i, j} (x, y)}{\sum_{i, j} w_{i, j} (x, y)}

(12)

where the weighting function

w_{i, j} (x, y)

is determined by the spatial distance between the pixel location and the center of the corresponding backlight partition, ensuring smooth transitions between neighboring regions. This formulation avoids explicit block boundaries and yields a spatially continuous backlight distribution in the pixel domain.

After obtaining the continuous backlight distribution, it is interpreted as a pixel-level transmittance modulation factor of the display system rather than a direct luminance addition term. To enhance the display dynamic range, a nonlinear remapping is applied:

T (x, y) = {(\frac{B^{c o n t} (x, y)}{B_{m a x}})}^{ρ}

(13)

where

ρ

denotes the transmittance adjustment exponent that controls the balance between highlight expansion and dark-region compression. This nonlinear modeling more closely reflects the coupling between backlight intensity and liquid crystal transmittance in real display systems, contributing to improved highlight clarity while preserving dark-region details.

The final output image is obtained by applying pixel-level transmittance modulation to the original input image. Let

I (x, y)

denote the luminance component of the input image, then the compensated luminance is given by

I^{o u t} (x, y) = I (x, y) \cdot T (x, y)

(14)

This multiplicative modulation avoids contrast compression that may arise from linear weighted fusion, allowing backlight adjustment to directly influence pixel luminance representation and thereby enhance local contrast.

To ensure display safety and numerical stability, a global constraint is applied to the output:

I^{f i n a l} (x, y) = C (I^{o u t} (x, y))

(15)

where

C (\cdot)

denotes luminance clipping and dynamic range mapping operations, ensuring that the final image remains within the valid luminance range of the display system.

By interpreting the backlight result as a continuous transmittance modulation signal rather than a simple luminance compensation term, the proposed pixel compensation module more faithfully simulates the collaborative relationship between backlight control and pixel response in Mini-LED display systems. This approach suppresses block artifacts, significantly enhances local contrast, and maintains smooth luminance transitions in dynamic scenes, providing high-quality, perceptually consistent visual output for the SwinLightNet framework.

3.2. Experimental Results

3.2.1. Comparison with Baseline Methods

In this experiment, several backlight extraction algorithms are employed for performance comparison, including the maximum-value method, mean-value method, root mean square (RMS) method, standard deviation-based method, error correction method, and the cumulative distribution function (CDF) thresholding method [20,21,22,23,24]. Since subjective visual assessment inherently involves a certain degree of randomness, objective image quality metrics are required to provide a comprehensive and quantitative evaluation of the processed images.

Accordingly, peak signal-to-noise ratio (PSNR), information entropy (IE), and structural similarity index (SSIM) are adopted as objective evaluation metrics in this study [25]. These metrics are used to quantitatively assess image fidelity before and after processing, contrast enhancement effectiveness, power-saving potential, information richness, and structural similarity (Figure 4).

PSNR measures the noise level between the original image

I

and the processed image

K

and is widely used to evaluate image fidelity. Information entropy (IE) reflects the richness of image information and is employed to assess the degree of detail preservation. SSIM evaluates the structural similarity between the original image

x

and the processed image

y

by jointly considering luminance, contrast, and structural components. In SSIM computation,

μ_{x}

and

μ_{y}

denote the mean intensities,

σ_{x}^{2}

and

σ_{y}^{2}

represent the variances, and

σ_{x y}

denotes the covariance between the two images. The stabilizing constants are defined as

c_{1} = (k_{1} L)^{2}

and

c_{2} = (k_{2} L)^{2}

, where

L = 255

,

k_{1} = 0.01

, and

k_{2} = 0.03

. In local dimming applications, an SSIM value closer to 1 indicates higher structural fidelity.

PSNR, SSIM and IE are computed as follows:

\begin{matrix} PR = (1 - \frac{\sum_{m, n} k_{m, n}}{r_{b} \times c_{b}}) \times 100 %, \\ IE = - \sum_{i = 0}^{255} p (i) {l o g}_{2} p (i), \\ SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})} . \end{matrix}

(16)

As shown in Figure 5, Figure 6 and Figure 7, the performance indicators of different backlight algorithms were compared using the div2k dataset. Quantitative results on three representative test scenarios demonstrate that the proposed SwinLightNet consistently outperforms the compared methods in terms of overall performance and stability. In terms of PSNR, SwinLightNet achieves the highest or near-highest values across all three test sets, with particularly pronounced advantages in the second and third scenarios involving complex luminance variations, indicating improved robustness in structural fidelity and luminance reconstruction accuracy.

Regarding the perceptual ratio (PR), SwinLightNet shows a clear advantage over traditional statistical backlight methods in the first test scenario and maintains a stable performance in the remaining cases, reflecting its ability to enhance visual contrast while suppressing distortion. Meanwhile, the entropy distribution produced by SwinLightNet remains well balanced, avoiding the luminance noise and instability introduced by over-enhancement in methods such as MEAN and MAX.

Overall, SwinLightNet achieves a favorable trade-off among structural accuracy, perceptual enhancement, and luminance stability, validating its effectiveness in generating high-quality Mini-LED backlight distributions under diverse content conditions.

The following experiments are conducted on the DIV2K validation set with an input resolution of

2040 \times 1356

and a partition grid of

40 \times 64

dimming zones. The ground truth is defined as the original input image to measure the fidelity of the final displayed image against the original content.

As shown in Table 2, the proposed SwinLightNet achieves the best overall performance on the low-contrast image set. Specifically, SwinLightNet obtains the highest PSNR value of 46.93 dB, indicating superior luminance reconstruction accuracy compared with traditional statistical backlight methods. In terms of structural similarity, SwinLightNet achieves the highest SSIM value of 0.9994, significantly outperforming the other methods and demonstrating its strong ability to preserve image structures under low-contrast conditions. In addition, SwinLightNet maintains a competitive information entropy value of 5.14, which is comparable to other statistical approaches while providing substantially higher reconstruction quality. In contrast, methods such as MAX and MEAN exhibit lower PSNR and SSIM values, indicating limited capability in accurately reconstructing low-contrast luminance distributions. Overall, these results confirm that SwinLightNet provides a more balanced and robust backlight optimization strategy for low-contrast images.

The quantitative results demonstrate that the proposed SwinLightNet achieves an effective balance between reconstruction accuracy and computational efficiency. As reported in Table 3, SwinLightNet attains a PSNR of 46.93 dB and an SSIM of 0.9994 with only 1.184 million parameters and 0.088 GFLOPs. Although StandardSwin, Uformer, Restormer, and SwinIR achieve slightly higher PSNR values ranging from 48.25 dB to 49.77 dB, these improvements are accompanied by a substantial increase in model complexity. Their parameter scales vary from 15.102 million to 187.759 million, and the computational cost ranges from 0.313 GFLOPs to 17.176 GFLOPs. In contrast, SwinLightNet reduces parameters and computational overhead by a large margin while maintaining highly competitive reconstruction performance. These results indicate that the proposed method offers a superior trade-off between accuracy and efficiency, making it more suitable for practical and resource-constrained Mini-LED backlight optimization applications.

3.2.2. Ablation Study

To evaluate the effectiveness of the proposed architecture, an ablation study was conducted by comparing several representative network variants. The evaluated models include the full SwinLightNet, a variant without the Swin Transformer module SwinLightNet-NoSwin, a pure convolutional network CNNOnlyNet, a lightweight version with reduced channels LightSwinLightNet, and a deeper version with additional encoder–decoder layers DeepSwinLightNet. These variants were designed to analyze the impact of global attention modeling, convolutional feature extraction, and network capacity on backlight estimation performance.

The CNNOnlyNet adopts a conventional convolutional encoder–decoder architecture composed of stacked convolution and up-sampling layers, which mainly focuses on local feature extraction. The SwinLightNet-NoSwin removes the Swin Transformer block from the proposed network while keeping the same encoder–decoder structure, allowing us to isolate the contribution of transformer-based global modeling. LightSwinLightNet reduces the number of channels in each stage to evaluate the effect of model capacity, while DeepSwinLightNet increases the depth of the network to explore whether deeper architectures bring further performance improvements.

The ablation study further verifies the effectiveness of the proposed SwinLightNet system design. As shown in Table 4, SwinLightNet achieves a PSNR of 46.93 dB and an SSIM of 0.9994 with only 1.184 M parameters and 0.088 GFLOPs, demonstrating a favorable balance between reconstruction accuracy and computational efficiency. When the Swin Transformer module is removed in SwinLightNet-NoSwin, the PSNR decreases to 43.25 dB, indicating that the global context modeling capability provided by the transformer plays an important role in backlight optimization. Compared with the purely convolutional architecture CNNOnlyNet, SwinLightNet achieves modest but consistent gains in reconstruction accuracy. Notably, CNNOnlyNet also demonstrates competitive performance (44.18 dB), suggesting that the overall system framework—including the encoder–decoder structure and loss functions—contributes significantly to the performance. The addition of the Swin module further enhances global luminance consistency, achieving the optimal balance between accuracy and complexity. Although increasing the network depth in DeepSwinLightNet and expanding the network width in WideSwinLightNet can slightly improve reconstruction performance, these variants introduce significantly higher parameter counts and computational costs. In contrast, SimplifiedSwinLightNet greatly reduces model complexity but suffers from good performance degradation. Overall, the proposed SwinLightNet provides a well-balanced architecture that integrates local feature extraction and global context modeling, achieving a competitive reconstruction performance while maintaining relatively low computational complexity.

A sensitivity analysis of loss weight coefficients was conducted, with results summarized in Table 5. Configuration 5 with λ₁ set to 1.0, λ₂ set to 0.5, and λ₃ set to 0.2 achieves the highest PSNR of 45.117 dB and SSIM of 0.9998. Removing the smoothness constraint reduces PSNR to 40.792 dB, confirming the critical role of spatial regularization. Therefore, Configuration 5 is selected as the optimal setting, balancing luminance accuracy and structural fidelity for Mini-LED local dimming systems.

The impact of window size and the Swin module on model performance was systematically evaluated, and the results are summarized in Table 6. When the Swin module is enabled, increasing the window size from 1 to 8 leads to a consistent improvement in reconstruction accuracy. Specifically, the PSNR increases from 45.532 dB at window size 1 to 46.933 dB at window size 8, while SSIM improves from 0.9992 to 0.9994. Notably, these performance gains are achieved without increasing the number of parameters, which remains at 1.1842 million, and with only minor variation in computational cost around 0.086 to 0.089 GFLOPs. However, further enlarging the window size to 16 results in a significant performance degradation, with PSNR dropping to 43.776 dB and computational cost rising to 0.1455 GFLOPs, indicating that excessively large windows are not suitable for low-resolution luminance partition maps. In contrast, disabling the Swin module reduces the parameter count to 0.9192 million and FLOPs to 0.0777 GFLOPs, but the PSNR decreases to 43.257 dB, confirming the importance of transformer-based global modeling. Overall, a window size of 8 provides the best trade-off between reconstruction performance and computational efficiency for the proposed SwinLightNet architecture.

4. Conclusions

This paper presents SwinLightNet, a task-specific hybrid CNN–Swin Transformer system designed for Mini-LED local dimming. The design integrates a lightweight CNN encoder with a bottleneck-placed Swin Transformer module, effectively balancing local feature extraction and global luminance correlation modeling under hardware constraints. A differentiable temporal constraint module is introduced to enforce frame-to-frame consistency, suppressing flickering while preserving motion responsiveness through adaptive weighting. Extensive experiments demonstrate that SwinLightNet achieves a state-of-the-art performance, outperforming existing methods in both quantitative metrics (PSNR, SSIM) and qualitative visual quality, with effective suppression of common artifacts such as halos and blockiness.

Despite these promising results, the approach has certain limitations. Real-time inference speed has not been validated on actual commercial display driver hardware, and generalization under extreme scenarios—such as ultra-low-light conditions or scenes with extremely rapid motion—warrants further investigation, as training and evaluation were primarily conducted on the DIV2K dataset. Future work will focus on hardware-aware deployment, including real-time benchmarking on embedded GPUs and FPGA platforms, enhancement of the temporal module with advanced motion estimation techniques, and expansion of training data to improve robustness under diverse and challenging conditions.

Author Contributions

Conceptualization, J.J., R.P., J.L. and M.Z.; methodology, J.J., R.P. and J.L.; software, J.J., R.P. and J.L.; validation, J.J., R.P. and J.L.; formal analysis, J.J., R.P. and J.L.; investigation, J.J., R.P. and J.L.; resources, J.J., R.P. and J.L.; data curation, J.J., R.P. and J.L.; writing—original draft preparation, J.J., R.P. and J.L.; writing—review and editing, M.Z.; supervision, M.Z.; funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hubei Provincial Department of Education (Grant No. B2020046), XinXin Semiconductor Manufacturing Co., Ltd. (Grant No. 2021802), and the PhD Research Foundation of Hubei University of Technology (Grant No. 00185).

Data Availability Statement

The DIV2K dataset is open to the public and can be obtained from New Trends in Image Restoration and Enhancement website: https://data.vision.ee.ethz.ch/cvl/DIV2K/ (accessed on 12 March 2026).

Conflicts of Interest

The authors declare that this study was funded by Xinxin Semiconductor Manufacturing Co., Ltd. (Grant Number: 2021802). The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Yang, Z.; Hsiang, E.-L.; Qian, Y.; Wu, S.-T. Performance comparison between mini-LED backlit LCD and OLED display for 15.6-inch notebook computers. Appl. Sci. 2022, 12, 1239. [Google Scholar] [CrossRef]
Gao, Z.; Ning, H.; Yao, R.; Xu, W.; Zou, W.; Guo, C.; Luo, D.; Xu, H.; Xiao, J. Mini-LED backlight technology progress for liquid crystal display. Crystals 2022, 12, 313. [Google Scholar] [CrossRef]
Chen, E.; Guo, J.; Jiang, Z.; Shen, Q.; Ye, Y.; Xu, S.; Sun, J.; Yan, Q.; Guo, T. Edge/direct-lit hybrid mini-LED backlight with U-grooved light guiding plates for local dimming. Opt. Express 2021, 29, 12179–12194. [Google Scholar] [CrossRef] [PubMed]
Schmidt, M.; Grüning, M.; Ritter, J.; Hudak, A.; Xu, C. Impact of High-Resolution Matrix Backlight on Local Dimming Performance and Its Characterization. J. Inf. Disp. 2019, 20, 95–104. [Google Scholar] [CrossRef]
Lei, J.; Zhu, H.; Huang, X.; Lin, J.; Zheng, Y.; Lu, Y.; Chen, Z.; Guo, W. Mini-LED Backlight: Advances and Future Perspectives. Crystals 2024, 14, 922. [Google Scholar] [CrossRef]
Kwon, J.U.; Bang, S.; Kang, D.; Yoo, J.J. 65-2: The Required Attribute of Displays for High Dynamic Range. In Proceedings of the SID Symposium Digest of Technical Papers; Blackwell Publishing Ltd.: Oxford, UK, 2016; Volume 47, pp. 884–887. [Google Scholar] [CrossRef]
Kang, S.-J.; Bae, S. Fast Segmentation-Based Backlight Dimming. J. Disp. Technol. 2015, 11, 399–402. [Google Scholar] [CrossRef]
Zhu, R.; Sarkar, A.; Emerton, N.; Large, T. 81-3: Reproducing High Dynamic Range Contents Adaptively based on Display Specifications. In Proceedings of the SID Symposium Digest of Technical Papers; John Wiley & Sons: Hoboken, NJ, USA, 2017; Volume 48, pp. 1188–1191. [Google Scholar] [CrossRef]
Chen, E.; Fan, Z.; Zhang, K.; Huang, C.; Xu, S.; Ye, Y.; Sun, J.; Yan, Q.; Guo, T. Broadband beam collimation metasurface for full-color micro-LED displays. Opt. Express 2024, 32, 10252–10264. [Google Scholar] [CrossRef] [PubMed]
Zheng, X.; Guo, W.; Tong, C.; Zeng, P.; Chen, G.; Gao, Y.; Zhu, L.; Chen, Y.; Wang, S.; Lin, Z.; et al. Origin of the inhomogeneous electroluminescence of GaN-based green mini-LEDs unveiled by microscopic hyperspectral imaging. ACS Photon 2022, 9, 3685–3695. [Google Scholar] [CrossRef]
Tong, C.; Yang, H.; Zheng, X.; Chen, Y.; He, J.; Wu, T.; Lu, Y.; Chen, Z.; Guo, W. Luminous characteristics of RGBW mini-LED integrated matrix devices for healthy displays. Opt. Laser Technol. 2024, 170, 110229. [Google Scholar] [CrossRef]
Song, S.-J.; Kim, Y.I.; Bae, J.; Nam, H. Deep Learning Based Pixel Compensation Algorithm For Local Dimming Liquid Crystal Displays of Quantum-dot Backlights. Opt. Express 2019, 27, 15907–15917. [Google Scholar] [CrossRef] [PubMed]
Chia, T.-L.; Syu, Y.-Y.; Huang, P.-S. A Novel Local Dimming Approach by Controlling LCD Backlight Modules via Deep Learning. Information 2025, 16, 815. [Google Scholar] [CrossRef]
Zheng, Y.; Wang, W. Lightweight CNN-Based Local Backlight Dimming for HDR Displays. In Proceedings of the 2025 37th Chinese Control and Decision Conference (CCDC); IEEE: New York, NY, USA, 2025; pp. 365–370. [Google Scholar] [CrossRef]
Zhang, T.; Wang, H.; Du, W.; Li, M. Deep CNN-Based Local Dimming Technology. Appl. Intell. 2022, 52, 903–915. [Google Scholar] [CrossRef]
Han, B.A.; Yang, J.J. Research on adaptive job shop scheduling problems based on dueling double DQN. IEEE Access 2020, 8, 186474–186495. [Google Scholar] [CrossRef]
Salh, A.; Audah, L.; Alhartomi, M.A.; Kim, K.S.; Alsamhi, S.H.; Almalki, F.A.; Abdullah, Q.; Saif, A.; Algethami, H. Smart packet transmission scheduling in cognitive IoT systems: DDQN based approach. IEEE Access 2022, 10, 50023–50036. [Google Scholar] [CrossRef]
Salh, A.; Audah, L.; Kim, K.S.; Alsamhi, S.H.; Alhartomi, M.A.; Abdullah, Q.; Almalki, F.A.; Algethami, H. Refiner GAN algorithmically enabled deep-RL for guaranteed traffic packets in real-time URLLC B5G communication systems. IEEE Access 2022, 10, 50662–50676. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2021; pp. 10012–10022. [Google Scholar] [CrossRef]
Lin, F.C.; Huang, Y.P.; Liao, L.Y.; Liao, C.-Y.; Shieh, H.-P.D.; Wang, T.-M. Dynamic backlight gamma on high dynamic range LCD TVs. J. Disp. Technol. 2008, 4, 139–146. [Google Scholar] [CrossRef]
Zhang, X.-B.; Wang, R.; Dong, D.; Han, J.-H.; Wu, H.-X. Dynamic Backlight Adaptation Based on the Details of Image for Liquid Crystal Displays. J. Disp. Technol. 2012, 8, 108–111. [Google Scholar] [CrossRef]
Seetzen, H.; Whitehead, L.A.; Ward, G. 54.2: A high dynamic range display using low and high resolution modulators. In Proceedings of the SID Symposium Digest of Technical Papers; Blackwell Publishing Ltd.: Oxford, UK, 2003; Volume 34, pp. 1450–1453. [Google Scholar] [CrossRef]
Cho, H.; Kwon, O.K. A backlight dimming algorithm for low power and high image quality LCD applications. IEEE Trans. Consum. Electron. 2009, 55, 839–844. [Google Scholar] [CrossRef]
Liu, Y.-Z.; Zheng, X.-R.; Chen, J.-B. Dynamic backlight signal extraction algorithm based on threshold of image CDF for LCD-TV and its hardware implementation. Chin. J. Liq. Cryst. Disp. 2020, 25, 449–453. [Google Scholar] [CrossRef]
Song, S.; Du, B.; Jiang, Z.; Wang, Y. A Dynamic Synchronization Adjustment Method for High-Speed Camera Exposure Time Based on Improved Information Entropy. J. Ordnance Equip. Eng. 2025, 46, 345–352. [Google Scholar]

Figure 1. Flowchart of the SwinLightNet Algorithm.

Figure 2. Network Architecture Diagram.

Figure 3. Comparison of Temporal Smoothing for Backlight Decision The input sequence includes (a) the previous frame, (b) the current frame, (c) the derived grayscale luminance map. (d) Results from SwinLightNet without temporal modeling show severe discontinuities and noise in uniform regions (e.g., the sky). In contrast, (e) results from SwinLightNet with the adaptive temporal constraint module exhibit superior smoothness and stability. This confirms that our method suppresses flickering artifacts during scene transitions without compromising local contrast or motion responsiveness.

Figure 4. Schematic illustration of the algorithm simulation.

Figure 5. Line Chart of PSNR Improvement Rate for Different Algorithms.

Figure 6. Line Chart of SSIM Improvement Rate for Different Algorithms.

Figure 7. Line Chart of IE Improvement Rate for Different Algorithms.

Table 1. Model Structure Table.

Stage	Layer/Module	Kernel/Operation	Channels (In → Out)	Output Size (H × W)	Description	Stage
Input	Low-resolution gray map	–	1 → 1	40 × 64	Dimming partition luminance map	Input
Encoder	ConvBlock1	Conv2d, k = 4, s = 2, p = 1 + ReLU	1 → 32	20 × 32	First downsampling	Encoder
	ConvBlock2	Conv2d, k = 4, s = 2, p = 1 + ReLU	32 → 64	10 × 16	Second downsampling
	ConvBlock3	Conv2d, k = 4, s = 2, p = 1 + ReLU	64 → 128	5 × 8	Third downsampling
Bottleneck	Conv Layer 1	Conv2d, k = 3, s = 1, p = 1 + ReLU	128 → 256	5 × 8	Feature expansion	Bottleneck
	Conv Layer 2	Conv2d, k = 3, s = 1, p = 1 + ReLU	256 → 128	5 × 8	Feature compression
Swin Stage	Swin Block 1	W-MSA + MLP (ratio = 2.0)	128 → 128	5 × 8	Window attention	Swin Stage
	Swin Block 2	SW-MSA + MLP (ratio = 2.0)	128 → 128	5 × 8	Shifted window attention
Decoder	DeConv1	ConvTranspose2d, k = 4, s = 2, p = 1 + ReLU	128 → 64	10 × 16	First upsampling	Decoder
	DeConv2	ConvTranspose2d, k = 4, s = 2, p = 1 + ReLU	64 → 32	20 × 32	Second upsampling
	DeConv3	ConvTranspose2d, k = 4, s = 2, p = 1 + Sigmoid	32 → 1	40 × 64	Backlight factor output

Table 2. Performance Comparison with Traditional Statistical Backlight Methods on DIV2K Dataset.

Backlight Method	SwinLightNet	MAX	MEAN	RMS	STD	EC	CDF
PSNR	46.93	43.34	40.26	43.65	42.16	41.61	42.56
IE	5.14	5.22	5.37	5.11	5.29	5.29	5.22
SSIM	0.9994	0.9559	0.9619	0.9842	0.9628	0.9595	0.9684

Table 3. Performance Comparison with Transformer-Based Deep Learning Methods on DIV2K Dataset.

Model Name	PSNR (dB)	SSIM	Parameters (M)	Computational Cost (GFLOPs)
SwinLightNet	46.93	0. 9994	1.184	0.088
StandardSwin	48.25	0.9987	15.102	0.313
Uformer	49.27	0.9988	56.338	6.398
Restormer	49.298	0.9620	58.386	5.464
SwinIR	49.77	0.9983	187.759	17.176

Table 4. Ablation study on different network architectures.

Model Name	PSNR (dB)	SSIM	Parameters (M)	Computational Cost (GFLOPs)
SwinLightNet	46.93	0. 9994	1.184	0.088
SwinLightNet-NoSwin	43.25	0.9987	0.919	0.078
CNNOnlyNet	44.18	0.9988	1.366	0.097
SimplifiedSwinLightNet	38.39	0.9620	0.062	0.014
DeepSwinLightNet	45.77	0.9983	2.039	0.112
WideSwinLightNet	45.85	0.9989	4.728	0.328
SwinLightNet-No Optimal Backlight Decision	43.89	0.9991	1.184	0.083

Table 5. Loss Weight Sensitivity Analysis.

Config.	λ₁	λ₂	λ₃	PSNR (dB)	SSIM	Note
1	1.0	0.5	0.1	44.711	0.9967	—
2	1.0	0.3	0.1	44.472	0.9997	—
3	1.0	0.7	0.1	44.232	0.9997	—
4	1.0	0.5	0.1	42.632	0.9996	—
5 (Ours)	1.0	0.5	0.2	45.117	0.9998	Best
6	0.8	0.5	0.1	44.250	0.9997	—
7	1.0	1.0	0.1	45.061	0.9997	—
8	1.0	0.5	0.0	40.792	0.9993	No Smooth
9	0.5	1.0	0.1	43.720	0.9997	—

Table 6. Window Size Sensitivity Analysis.

Config.	Window Size	Use Swin	Params (M)	FLOPs (G)	PSNR (dB)	SSIM
1	1	Yes	1.1842	0.0884	45.532	0.9992
2	2	Yes	1.1842	0.0858	46.007	0.9992
3	4	Yes	1.1842	0.0873	46.564	0.9993
4	8	Yes	1.1842	0.0885	46.933	0.9994
4	16	Yes	1.1842	0.1455	43.776	0.9988
5	None	No	0.9192	0.0777	43.257	0.9987

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Pu, R.; Jiang, J.; Zhu, M. A Swin-Transformer-Based Network for Adaptive Backlight Optimization. Symmetry 2026, 18, 502. https://doi.org/10.3390/sym18030502

AMA Style

Li J, Pu R, Jiang J, Zhu M. A Swin-Transformer-Based Network for Adaptive Backlight Optimization. Symmetry. 2026; 18(3):502. https://doi.org/10.3390/sym18030502

Chicago/Turabian Style

Li, Jin, Rui Pu, Junbang Jiang, and Man Zhu. 2026. "A Swin-Transformer-Based Network for Adaptive Backlight Optimization" Symmetry 18, no. 3: 502. https://doi.org/10.3390/sym18030502

APA Style

Li, J., Pu, R., Jiang, J., & Zhu, M. (2026). A Swin-Transformer-Based Network for Adaptive Backlight Optimization. Symmetry, 18(3), 502. https://doi.org/10.3390/sym18030502

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Swin-Transformer-Based Network for Adaptive Backlight Optimization

Abstract

1. Introduction

2. Adaptive Spatial-Contrast-Enhanced Local Dimming Method Based on Mini-LED Technology

2.1. Overall System Design

2.2. Backlight Extraction

2.2.1. Network Architecture

2.2.2. Loss Function

2.3. Backlight Constraint

2.3.1. Local Luminance Consistency Constraint

2.3.2. Contrast-Preserving Adaptive Adjustment

2.4. Optimal Backlight Decision

2.4.1. Change-Aware Inter-Frame Backlight Modeling

2.4.2. Adaptive Incremental Temporal Adjustment

2.4.3. Output Normalization and Module Analysis

3. Simulation Design

3.1. Pixel Compensation

3.2. Experimental Results

3.2.1. Comparison with Baseline Methods

3.2.2. Ablation Study

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI