Dynamic Tensor Modeling for Missing Data Completion in Electronic Toll Collection Gantry Systems

The deployment of Electronic Toll Collection (ETC) gantry systems marks a transformative advancement in the journey toward an interconnected and intelligent highway traffic infrastructure. The integration of these systems signifies a leap forward in streamlining toll collection and minimizing environmental impact through decreased idle times. To solve the problems of missing sensor data in an ETC gantry system with large volumes and insufficient traffic detection among ETC gantries, this study constructs a high-order tensor model based on the analysis of the high-dimensional, sparse, large-volume, and heterogeneous characteristics of ETC gantry data. In addition, a missing data completion method for the ETC gantry data is proposed based on an improved dynamic tensor flow model. This study approximates the decomposition of neighboring tensor blocks in the high-order tensor model of the ETC gantry data based on tensor Tucker decomposition and the Laplacian matrix. This method captures the correlations among space, time, and user information in the ETC gantry data. Case studies demonstrate that our method enhances ETC gantry data quality across various rates of missing data while also reducing computational complexity. For instance, at a less than 5% missing data rate, our approach reduced the RMSE for time vehicle distance by 0.0051, for traffic volume by 0.0056, and for interval speed by 0.0049 compared to the MATRIX method. These improvements not only indicate a potential for more precise traffic data analysis but also add value to the application of ETC systems and contribute to theoretical and practical advancements in the field.


Introduction
Electronic Toll Collection (ETC) systems signify a remarkable advancement in vehicular management, enabling swift identification and seamless toll transactions.ETC gantry data are characterized by their large volume, diverse modalities, rapid acquisition, coexistence of genuine and erroneous data, and rich potential for transportation applications.These systems are primarily classified into two categories: toll plazas equipped with dedicated ETC lanes and overpass gantries [1,2].Notably, China has taken a pioneering role in this technological transformation since 2019, systematically phasing out provincial boundary toll stations in favor of the widespread deployment of ETC gantries [3].This bold initiative has led to the establishment of a robust network, supported by over 230 million ETC users, marking the dawn of a new era in toll collection [4].
ETC systems represent a significant technological advancement in traffic management, characterized by high-quality, extensive coverage and the widespread accessibility of data.These data, renowned for their comprehensiveness, play a crucial role in enabling precise traffic analyses [5][6][7].However, operating such advanced systems comes with a unique set of challenges, primarily influenced by factors such as system equipment performance, the installation of image automatic capture and recognition systems, weather and lighting conditions, and the proper functioning of equipment [8].
be sensitive to outliers, may not work well when dealing with sparse matrices, and are usually used for numerical data, not for categorical data [14].Bayesian methods allow for the introduction of uncertainty into the estimation of missing values and provide estimates of probability distributions.Markov Chain Monte Carlo (MCMC) and Bayesian networks are part of the Bayesian approach.However, in the case of high-dimensional and large-scale data, the computational complexity of Bayesian methods increases and requires a priori knowledge to specify the probability distribution [15].Deep learning techniques, such as Recurrent Neural Networks (RNNs) and transformers, have made significant progress in data completion tasks.They can handle complex data patterns and long-term dependencies.However, complex machine learning models may require more computational resources and a large amount of labeled data for training, which, in some cases, may not be easily accessible [16][17][18][19][20].For tensor-based complementation methods applied to high-dimensional tensors, the increase in dimensionality may lead to dimensionality catastrophe, i.e., the data become very sparse, increasing the probability of missing data points, increasing the difficulty in the complementation problem, leading to an increase in uncertainty in the results of the complementation, and the model suffers from overfitting or difficulty in generalizing to new data [21][22][23].
The intricacy of traffic flow data, with its multifaceted correlations and patterns, necessitates analytical approaches that can adeptly manage their complexity.Tensor models have risen to prominence in this context, offering a robust framework for capturing highorder correlations and multimodal data interactions inherent in traffic systems [24,25].These models are particularly valued for their ability to exploit the intrinsic low-rank properties of traffic data, thus enhancing the precision and interpretability of the analyses.A significant contribution in this domain has been made by Chen et al., who utilized Gaussian regular polynomial decomposition to unravel the latent structures within the data, applying Bayesian networks to fill in the gaps of incomplete datasets [15].Their work stands out for integrating probabilistic models with tensor decomposition, providing a powerful tool for data restoration and uncertainty quantification in traffic flow analysis.Expanding upon these foundational models, subsequent research has delved into dynamic tensor flow models that aim to capture the spatiotemporal dynamics of traffic flow.These models are sophisticated, as they are designed to analyze data across both macroscales, which consider long-term trends and cyclic patterns, and microscales, which focus on the minute-by-minute fluctuations of traffic flow [26][27][28].By doing so, they provide a granular view of traffic dynamics, offering insight into the temporal progression and spatial distribution of traffic congestion, vehicle speeds, and density.However, the application of these models is not without challenges.One critical issue is the computational and storage inefficiencies that arise from the redundant calculations within tensor windows, which often result in increased processing time and memory requirements [29].This redundancy is particularly problematic when dealing with large-scale traffic datasets, where the efficiency of computation can significantly impact the timeliness and usability of the analysis.
To address these inefficiencies, there has been a push toward optimizing tensor calculations, such as employing more sophisticated tensor decomposition techniques that reduce redundancy and accelerate computation.These techniques include the use of block term decomposition, which partition the tensor into smaller, more manageable blocks, and the implementation of parallel computing strategies that distribute the workload across multiple processors [30].Furthermore, advancements in hardware, such as the use of GPUs for tensor operations, have also contributed to alleviating the computational burden [30].In summary, while existing tensor models have laid a solid foundation for traffic data analysis, there is a continual need for innovation to overcome the computational challenges associated with these complex models.Our research contributes to this field by proposing a novel tensor model that not only captures the high-order correlations and spatiotemporal dynamics of traffic flow but also improves computational efficiency through optimized decomposition techniques and algorithmic enhancements.
To combat the aforementioned deficiencies, particularly with large volumes of ETC gantry data, our study introduces an innovative method for missing data completion that leverages an enhanced dynamic tensor decomposition technique.This method not only acknowledges the high-dimensional nature of ETC gantry data but also pioneers the use of a high-order tensor model, refined through an improved tensor Tucker decomposition process.The main contributions of our study are manifold: (1) A dynamic tensor flow model was proposed for decomposing the high-order tensor model of ETC gantry data after analyzing the characteristics of ETC gantry data.(2) The issue of high sparsity in ETC gantry data was addressed through the introduction of an adaptive tensor window to analyze incremental tensor blocks in dynamic tensor flow.Furthermore, an improvement was made to the incremental processing method of the dynamic tensor flow by incorporating a Laplacian matrix.(3) An approximate computation approach was adopted to handle the decomposition of neighboring tensor blocks within the same group, leveraging the characteristics of the gantry data, leading to the computational speed enhancement of the high-order tensor Tucker decomposition.

High-Order Tensor Model for ETC Gantry Data
ETC gantry data on highways exhibit multifaceted characteristics, including complex structural patterns, rapid update frequencies, and varied density values across different data types.To address these complexities, this section introduces a high-order tensor model framework specifically designed for ETC gantry data.This framework aims to filter out attributes that do not pertain to traffic analysis and to streamline the dimensionality of the data.The extraction methodology is depicted in Figure 1.
Sensors 2024, 24, 86 4 of 23 spatiotemporal dynamics of traffic flow but also improves computational efficiency through optimized decomposition techniques and algorithmic enhancements.
To combat the aforementioned deficiencies, particularly with large volumes of ETC gantry data, our study introduces an innovative method for missing data completion that leverages an enhanced dynamic tensor decomposition technique.This method not only acknowledges the high-dimensional nature of ETC gantry data but also pioneers the use of a high-order tensor model, refined through an improved tensor Tucker decomposition process.The main contributions of our study are manifold: (1) A dynamic tensor flow model was proposed for decomposing the high-order tensor model of ETC gantry data after analyzing the characteristics of ETC gantry data.(2) The issue of high sparsity in ETC gantry data was addressed through the introduction of an adaptive tensor window to analyze incremental tensor blocks in dynamic tensor flow.Furthermore, an improvement was made to the incremental processing method of the dynamic tensor flow by incorporating a Laplacian matrix.(3) An approximate computation approach was adopted to handle the decomposition of neighboring tensor blocks within the same group, leveraging the characteristics of the gantry data, leading to the computational speed enhancement of the high-order tensor Tucker decomposition.

High-Order Tensor Model for ETC Gantry Data
ETC gantry data on highways exhibit multifaceted characteristics, including complex structural patterns, rapid update frequencies, and varied density values across different data types.To address these complexities, this section introduces a high-order tensor model framework specifically designed for ETC gantry data.This framework aims to filter out attributes that do not pertain to traffic analysis and to streamline the dimensionality of the data.The extraction methodology is depicted in Figure 1.In Figure 1, we provide an overview of our proposed high-order tensor model for processing ETC gantry data.The figure illustrates the architecture of the model, which consists of two main components: the high-order tensor model for ETC gantry data on the left and the traffic data calculation process based on the model on the right.The left component encapsulates ETC gantry data in the form of tensor blocks, indicating the multidimensional properties of data collected from various gantry structures.This is further decomposed into subtensor blocks for detailed analysis and processing.The right component shows the sequence of our improved dynamic tensor decomposition model, which In Figure 1, we provide an overview of our proposed high-order tensor model for processing ETC gantry data.The figure illustrates the architecture of the model, which consists of two main components: the high-order tensor model for ETC gantry data on the left and the traffic data calculation process based on the model on the right.The left component encapsulates ETC gantry data in the form of tensor blocks, indicating the multidimensional properties of data collected from various gantry structures.This is further decomposed into subtensor blocks for detailed analysis and processing.The right component shows the sequence of our improved dynamic tensor decomposition model, which starts from a high-order tensor model and is decomposed through dynamic tensor flow decomposition, combined with the Laplace matrix for sparsity control, and reaches its climax through the incremental approximate decomposition of the tensor blocks.The interaction between the subtensor blocks and improved dynamic decomposition model is the core of achieving precise traffic data computation, which is the ultimate goal of our research.

Multidimensional Tensor Construction for ETC Gantry Data Analysis
We construct a three-dimensional tensor model that intricately integrates time, gantry space, and vehicle user dimensions.We measure the time dimension in seconds, setting the minimum counting unit at one second, and tailoring the parameters to reflect the dynamics of traffic flow accurately.The statistical duration determines the time dimension's granularity.The gantry space dimension captures the driving direction, gantry number, and functionality, providing a spatial context for our analysis.We define the vehicle user dimension through parameters such as license plate information, vehicle model, and transaction status, ensuring a comprehensive categorization.Our tensor model meticulously records vehicle passage times, offering granular insight into the temporal flow, while the spatial dimension is marked by specific gantry identifiers.However, the model's expansion across time, space, and vehicle diversity leads to increased computational complexity and tensor size-a challenge compounded by significant sparsity.Large periods and areas lacking vehicle transactions result in an unnecessarily inflated tensor, complicating data storage and computation.A three-dimensional tensor model of the ETC gantry data is shown in Figure 2.
Sensors 2024, 24, 86 5 of 23 starts from a high-order tensor model and is decomposed through dynamic tensor flow decomposition, combined with the Laplace matrix for sparsity control, and reaches its climax through the incremental approximate decomposition of the tensor blocks.The interaction between the subtensor blocks and improved dynamic decomposition model is the core of achieving precise traffic data computation, which is the ultimate goal of our research.

Multidimensional Tensor Construction for ETC Gantry Data Analysis
We construct a three-dimensional tensor model that intricately integrates time, gantry space, and vehicle user dimensions.We measure the time dimension in seconds, setting the minimum counting unit at one second, and tailoring the parameters to reflect the dynamics of traffic flow accurately.The statistical duration determines the time dimension's granularity.The gantry space dimension captures the driving direction, gantry number, and functionality, providing a spatial context for our analysis.We define the vehicle user dimension through parameters such as license plate information, vehicle model, and transaction status, ensuring a comprehensive categorization.Our tensor model meticulously records vehicle passage times, offering granular insight into the temporal flow, while the spatial dimension is marked by specific gantry identifiers.However, the model's expansion across time, space, and vehicle diversity leads to increased computational complexity and tensor size-a challenge compounded by significant sparsity.Large periods and areas lacking vehicle transactions result in an unnecessarily inflated tensor, complicating data storage and computation.A three-dimensional tensor model of the ETC gantry data is shown in Figure 2.  In this model, the x-axis represents time, segmented into units of 4 s each, capturing the temporal aspect of traffic flow.The y-axis corresponds to the gantry ID, delineating the specific location of each gantry along the highway.The z-axis indicates the presence or absence of vehicles in different lanes at any given time.Distinct colors represent different lanes for ease of interpretation: orange for the first lane, yellow for the second, and green  In this model, the x-axis represents time, segmented into units of 4 s each, capturing the temporal aspect of traffic flow.The y-axis corresponds to the gantry ID, delineating the specific location of each gantry along the highway.The z-axis indicates the presence or absence of vehicles in different lanes at any given time.Distinct colors represent different lanes for ease of interpretation: orange for the first lane, yellow for the second, and green for the third.This color-coding aids in visualizing the distribution and movement of traffic across various lanes.The diagram exemplifies the model's capacity to encapsulate detailed information about vehicle flow, including temporal and spatial dynamics.However, it also Sensors 2024, 24, 86 6 of 23 highlights the challenge of tensor inflation due to large periods and areas lacking vehicle transactions, which can complicate data storage and computational processes.
To address these issues, our analysis employs innovative techniques to compress the tensor effectively.These methods enable us to discern significant patterns within the sparse dataset, preserving the fidelity of the temporal and spatial details.Our approach ensures that the model remains both manageable and reflective of the intricate patterns within traffic data.

ETC Gantry Tensor Block and Subtensor Block Models
This study introduces gantry tensor block models to encapsulate the traffic statistics of a gantry over one-minute intervals.These models facilitate streamlined calculations through a methodically organized sequence.In alignment with high-order tensor models, these blocks adopt time, space, and vehicle user dimensions as their fundamental structure.Each time dimension within a tensor block corresponds to 1 min, comprising 60 distinct time components.The spatial dimension is bifurcated into 2 key factors: gantry number and gantry malfunction.The vehicle user dimension mirrors that of the high-order tensor model, ensuring consistency.Crucially, the process of dimensionality reduction applied to ETC gantry data retains the intrinsic properties of the original high-dimensional dataset within a more manageable, low-dimensional framework.For practical implementation, attributes such as ETC gantry number, capture time, and license plate identification were meticulously selected to construct a subtensor block model, as follows: Herein, X LPR represents the subtensor block of the ETC gantry plate recognition data, encompassing the capture time, I time = t pictime ; gantry number,I space = [s id ]; and identified license plate, I car = [c license ].
Similarly, the ETC gantry transaction data are synthesized into a subtensor block model, effectively enabling the assessment of transactional patterns: Herein, X trade represents the subtensor block of the ETC gantry transaction data, encompassing the capture time, I time = [t tradetime ]; gantry number, I space = [s id ]; and vehicle license plate recognition information, vehicle type, and transaction outcome, I car = c license , c type , c match .

Extension of the Subtensor Block Models to the Tensor Block Model
The subtensor block models are integrated into a higher-order tensor space; components within the same dimension and of identical order are merged, preserving the unique orders.The amalgamation of subtensor blocks is executed as follows: Sensors 2024, 24, 86 7 of 23 Equation (3) delineates the method for merging subtensor blocks.In this context, A ∈ R I×J×K and B ∈ R I×J represent tensor blocks with different dimensions, where By expanding each subtensor block model, we construct a tensor block model, which is then sequentially organized along the timeline to form a composite, higher-order tensor model.Parameters of identical order within each dimension constitute the foundational attribute parts, while those of varying orders are categorized as extensible attribute parts, facilitating a modular approach to model construction.

Traffic Data Calculation Based on the High-Order Tensor Model
Traffic data extraction was performed through calculations using a high-order tensor model.Figure 3a depicts this model, which utilizes 30 min traffic statistics from two ETC gantries, each with three lanes.In this model, the vehicle types within tensor blocks are color-coded, enhancing the model's visual interpretation.This tensor model is adaptable to various spatial component transformations-including expansion, reflection, shear, projection, and rotation-enabling complex, multidimensional traffic data calculations.
For instance, Figure 3b illustrates the calculation of traffic data for lane 2 at the G005032001000110010 gantry.By segmenting a slice of the tensor model for this specific gantry, the model yields comprehensive traffic information for lane 2. This slice encompasses tensor blocks representing different time intervals and crucial traffic parameters, such as traffic volume, average time headway, vehicle type distribution, and space occupancy rate.Additionally, the process of retrieving vehicle-level information is demonstrated in Figure 3c.By entering the license plate details of a specific vehicle, the model directly locates the corresponding tensor block.Subsequently, traffic parameters, including interval speed and time headway for that vehicle, are accurately computed.By expanding each subtensor block model, we construct a tensor block mod is then sequentially organized along the timeline to form a composite, higher-ord model.Parameters of identical order within each dimension constitute the fou attribute parts, while those of varying orders are categorized as extensible attrib facilitating a modular approach to model construction.

Traffic Data Calculation Based on the High-Order Tensor Model
Traffic data extraction was performed through calculations using a high-ord model.Figure 3a depicts this model, which utilizes 30 min traffic statistics from gantries, each with three lanes.In this model, the vehicle types within tensor b color-coded, enhancing the model's visual interpretation.This tensor model is a to various spatial component transformations-including expansion, reflectio projection, and rotation-enabling complex, multidimensional traffic data calcu For instance, Figure 3b illustrates the calculation of traffic data for lane G005032001000110010 gantry.By segmenting a slice of the tensor model for th gantry, the model yields comprehensive traffic information for lane 2. This slic passes tensor blocks representing different time intervals and crucial traffic pa such as traffic volume, average time headway, vehicle type distribution, and sp pancy rate.Additionally, the process of retrieving vehicle-level information i strated in Figure 3c.By entering the license plate details of a specific vehicle, t directly locates the corresponding tensor block.Subsequently, traffic parameter ing interval speed and time headway for that vehicle, are accurately computed.

ETC Gantry Data Completion Method Based on Improved Tensor Decomposition
This study introduces an Improved Dynamic Tensor Decomposition (IDTD) specifically designed to extract high-dimensional and dynamic traffic features w hanced accuracy and efficiency.To counter the sparsity issue prevalent in highsional data, a Laplacian matrix was employed to regulate parameter sparsity.F more, approximate calculations were conducted to ascertain the tensor core and matrices, significantly reducing computational overhead.

Improved Dynamic Tensor Decomposition Model
The sparse nature of traffic data presents significant challenges.The "vehicl time stamp" information matrix typically exhibits low density, which becomes i ingly problematic as the volume of highway vehicles and the frequency of traffic timestamps surge, thereby diminishing the information density related to vehicl addition of a spatial dimension exacerbates this sparsity.Our approach involves ing tensor flow within a stipulated timeframe and introducing a tensor window data tensor flow.The tensor flow for a fixed period with a tensor window size denoted as: (, ) =  ( ) , … ,   ∈  × ×,...,

ETC Gantry Data Completion Method Based on Improved Tensor Decomposition
This study introduces an Improved Dynamic Tensor Decomposition (IDTD) model, specifically designed to extract high-dimensional and dynamic traffic features with enhanced accuracy and efficiency.To counter the sparsity issue prevalent in high-dimensional data, a Laplacian matrix was employed to regulate parameter sparsity.Furthermore, approximate calculations were conducted to ascertain the tensor core and factor matrices, significantly reducing computational overhead.

Improved Dynamic Tensor Decomposition Model
The sparse nature of traffic data presents significant challenges.The "vehicle user-time stamp" information matrix typically exhibits low density, which becomes increasingly problematic as the volume of highway vehicles and the frequency of traffic record timestamps surge, thereby diminishing the information density related to vehicles.The addition of a spatial dimension exacerbates this sparsity.Our approach involves analyzing tensor flow within a stipulated timeframe and introducing a tensor window to the data tensor flow.The tensor flow for a fixed period with a tensor window size of w is denoted as: Sensors 2024, 24, 86 As depicted in Figure 4a, the tensor flow X t ∈ R I 1 ×I 2 ×,...,I N , (1 ≤ t ≤ T) performs local processing using a sliding tensor window of size w.Moreover, dynamic traffic flow information is captured using the dynamic tensor model.The decomposition of the Nth-order tensor flow X t ∈ R I 1 ×I 2 ×,...,I N , (1 ≤ t ≤ T), is articulated as: where U n represents the factor matrix corresponding to mode n (the principal component of mode n); G t is the core tensor, encapsulating independent features that articulate the interactions among various pattern principal components; and U 1 , U 2 , and U 3 are the factor matrices for modes 1, 2, and 3, respectively.The constituents of the core tensor stream,G t , mirror the interplay between principal components and the temporal dynamics of this tensor series.The intricacies of tensor flow decomposition are exemplified in Figure 4b.
Sensors 2024, 24, 86 9 of 23 As depicted in Figure 4a, the tensor flow  ∈ ℝ × ×,..., , (1 ≤  ≤ ) performs local processing using a sliding tensor window of size .Moreover, dynamic traffic flow information is captured using the dynamic tensor model.The decomposition of the Nth-order tensor flow  ∈  × ×,..., , (1 ≤  ≤ ), is articulated as: where  represents the factor matrix corresponding to mode  (the principal component of mode );  is the core tensor, encapsulating independent features that articulate the interactions among various pattern principal components; and  ,  , and  are the factor matrices for modes 1, 2, and 3, respectively.The constituents of the core tensor stream,  , mirror the interplay between principal components and the temporal dynamics of this tensor series.The intricacies of tensor flow decomposition are exemplified in Figure 4b.

Laplacian Matrix for Sparsity Control
The dynamic tensor flow decomposition model for ETC gantry data is illustrated in Figure 5.This model postulates that a segment of vehicle passage information is represented by the tuple (, , , ) , where the passage information  for vehicle  at gantry

Laplacian Matrix for Sparsity Control
The dynamic tensor flow decomposition model for ETC gantry data is illustrated in Figure 5.This model postulates that a segment of vehicle passage information is represented by the tuple (c, e, v, t), where the passage information v for vehicle c at gantry e is recorded in period t, (t = 1, ..., T).The ETC gantry data, structured as a time-series, form a dynamic tensor sequence X t ∈ R n c ×n e ×n v , where n c is the count of vehicles passing within the chosen time, n e denotes the quantity of ETC gantries, and n v represents the volume of passage data points.Leveraging Tucker decomposition, the dynamic tensor sequence decomposes as: where Y t ∈ R r (c) ×r (e) ×r (v) is the core tensor sequence encapsulating dynamic behavioral patterns that describe the interplay among vehicles, gantries, and passage information.It signifies the likelihood of the occurrence for passage information from group j (v) at group j (e) gantries for vehicles belonging to group j (c) before time t.
The matrix C t ∈ R n c ×r (c) serves as the vehicle factor matrix up until time t, where C t i (c) , j (c) indicates the probability that the i (c) th vehicle belongs to the j (c) th vehicle group.
The matrix E t ∈ R n e ×r (e) is the factor matrix for the ETC gantries before time t, and E t i (e) , j (e) illustrates the likelihood that the i (e) th gantry belongs to the j (e) th gantry group.
The matrix V t ∈ R n v ×r (v) represents the factor matrix for passage information until time t, and V t i (v) , j (v) denotes the probability that the i (v) th passage belongs to the j (v) th passage information group.
Sensors 2024, 24, 86 10 of 23 is recorded in period , ( = 1 , . . ., ).The ETC gantry data, structured as a time-series, form a dynamic tensor sequence  ∈  × × , where  is the count of vehicles passing within the chosen time,  denotes the quantity of ETC gantries, and  represents the volume of passage data points.Leveraging Tucker decomposition, the dynamic tensor sequence decomposes as: where  ∈  ( ) × ( ) × ( ) is the core tensor sequence encapsulating dynamic behavioral patterns that describe the interplay among vehicles, gantries, and passage information.It signifies the likelihood of the occurrence for passage information from group  ( ) at group  ( ) gantries for vehicles belonging to group  ( ) before time .
The matrix  ∈  × ( ) serves as the vehicle factor matrix up until time , where   ( ) ,  ( ) indicates the probability that the  ( ) th vehicle belongs to the  ( ) th vehicle group.
The matrix ∈  × ( ) is the factor matrix for the ETC gantries before time , and   ( ) ,  ( ) illustrates the likelihood that the  ( ) th gantry belongs to the  ( ) th gantry group.
The matrix  ∈ ℝ × ( ) represents the factor matrix for passage information until time  , and   ( ) ,  ( ) denotes the probability that the  ( ) th passage belongs to the  ( ) ℎ passage information group.The tensor model for ETC gantry traffic data is inherently intricate.While large tensor windows yield strong classification capabilities, they concomitantly escalate computational complexity [31].To maintain the precision of tensor approximation decomposition within homogenous tensor groups, we introduced variable tensor window sizes.The interrelation among discrete tensor blocks was quantified using the Pearson correlation coefficient  , setting the tensor window size predicated on the threshold  > 0.95.The Pearson correlation coefficient and the corresponding tensor window size are computed as follows: The tensor model for ETC gantry traffic data is inherently intricate.While large tensor windows yield strong classification capabilities, they concomitantly escalate computational complexity [31].To maintain the precision of tensor approximation decomposition within homogenous tensor groups, we introduced variable tensor window sizes.The interrelation among discrete tensor blocks was quantified using the Pearson correlation coefficient r ij , setting the tensor window size predicated on the threshold r ij > 0.95.The Pearson correlation coefficient and the corresponding tensor window size are computed as follows: Figure 6 illustrates the range of maximum and minimum tensor window values, which vary according to different correlation coefficients.To minimize the computational load while preserving the integrity of the tensor decomposition approximations, this study synthesized the established discriminative threshold of the Pearson correlation coefficient with the concrete context of the ETC gantry data [32].A correlation coefficient exceeding 0.95 r ij > 0.95 is indicative of a strong correlation, prompting the adjustment of the tensor window size to this stringent standard.
To address the challenges posed by high data sparsity, the Laplacian matrices L (c) , L (e) , and L (v) were employed, quantifying the similarity among individual entities and their collective groups, namely, passing vehicles, ETC gantries, and transit information.The Laplacian matrix is defined as: where D is the degree matrix, and W is the similarity matrix.The (i, j)th matrix element represents the similarity between the ith and jth entities.The similarity degrees for vehicles, gantries, and transit data are tailored according to parameters like car model, geographic proximity, and interval headway time.Direct comparisons among entities are facilitated by numerical differentials, with vehicle model specifics calibrated by the conversion factor detailed in Table 1.
= (  ) (  ) = 1,  0.95 0,  0.95 Figure 6 illustrates the range of maximum and minimum tensor window values, which vary according to different correlation coefficients.To minimize the computational load while preserving the integrity of the tensor decomposition approximations, this study synthesized the established discriminative threshold of the Pearson correlation coefficient with the concrete context of the ETC gantry data [32].A correlation coefficient exceeding 0.95  > 0.95 is indicative of a strong correlation, prompting the adjustment of the tensor window size to this stringent standard.
To address the challenges posed by high data sparsity, the Laplacian matrices  ( ) ,  ( ) , and  ( ) were employed, quantifying the similarity among individual entities and their collective groups, namely, passing vehicles, ETC gantries, and transit information.The Laplacian matrix is defined as: where  is the degree matrix, and  is the similarity matrix.The (, )th matrix element represents the similarity between the th and th entities.The similarity degrees for vehicles, gantries, and transit data are tailored according to parameters like car model, geographic proximity, and interval headway time.Direct comparisons among entities are facilitated by numerical differentials, with vehicle model specifics calibrated by the conversion factor detailed in Table 1.The elements of the similarity matrix, , along each column are aggregated to yield  values, and these  values are then positioned on the main diagonal to construct the degree matrix, , as an  ×  diagonal matrix, leaving the nondiagonal entries at zero.The derived Laplacian matrices,  ( ) ,  ( ) , and  ( ) , are symmetric and positive semi-definite, each with its minimum eigenvalue being zero.This characteristic is instrumental for  The elements of the similarity matrix, W, along each column are aggregated to yield n values, and these n values are then positioned on the main diagonal to construct the degree matrix, D, as an n × n diagonal matrix, leaving the nondiagonal entries at zero.The derived Laplacian matrices, L (c) , L (e) , and L (v) , are symmetric and positive semi-definite, each with its minimum eigenvalue being zero.This characteristic is instrumental for the execution of incremental approximate decomposition of the tensor block, simplifying the computational process involved in the analysis of the ETC gantry data.

Calculation of the Incremental Approximate Decomposition of the Tensor Block
The ETC gantry data exhibit a substantial spatial correlation.Given the minor variations within neighboring tensor blocks, these correlations can be harnessed to conserve computational resources.The sparse increment at time t, denoted by ∆X t = X t+1 − X t , facilitates a more efficient dynamic tensor decomposition.

1.
Initialization of the Core Tensor and Constraint Terms: Utilizing Tucker decomposition, we acquire the initial tensor X 1 , alongside the constraint term L (m) | M m=1 ∈ R n m ×n m , within the M-dimensional tensor sequence X t ∈ R I 1 ×I 2 ×...×I m .2.
Initial Tensor Covariance Matrix Calculation: At time t = 1, the covariance matrix for the mth dimension of the initial tensor X 1 is defined as: where X 1 (m) is the unfolded matrix of X 1 along the mth dimension, and µ (m) is the weight Factor Matrix and Core Tensor Calculation: The factor matrix U .The core tensor Y 1 ∈ R r 1 ×...×r M is computed correspondingly.This procedure is outlined in Table 2.
Table 2. Initial core tensor and factor matrix calculation process.

Update of the Factor Matrix U (m) t+1
The revised factor matrix, U t+1 , is realized through the calculation of the update of the eigenvalue, λ m t+1,i , and the eigenvector, u m t+1,i , as , whose eigenvalue is λ m t,i with an eigenvector of u m t,i ; the size of the change in X t is ∆X t .
Bringing the expressions for λ m t+1,i and u m t+1,i into C t+1 m and omitting t simplifies the equation.
[(X (m) By focusing only on first-order variable terms, we simplify to: The orthogonality of eigenvectors permits the characterization of the changes in ∆u m i using the current eigenvectors: where α ij is the constant to be calculated and substituted to obtain: (X (m) X (m)T +µ (m) L (m) ) The above equation can be further simplified by multiplying the left-hand side of the equation by u m k T , and the above equation can be further simplified: The eigenvectors satisfy orthogonal unitization, which yields the following: Eliminating the higher-order regression term leads to α ii = 0.

5.
Updating the Core Tensor, Y t+1 ∈ R r 1 ×...×r M The core tensor can be updated according to the updating of the factor matrix, U t+1 ∈ R n m ×r m , and the calculation process for the core tensor Y t+1 ∈ R r 1 ×...×r M is summarized in Table 3. Table 3. Calculation flow for updating the core tensor.

ETC Gantry Data Completion Based on Improved Tensor Dynamic Decomposition
Restoration of missing ETC gantry data leverages the correlation among inter-tensor block traffic data.We approximate the kernel tensor for missing data by analyzing the decomposition of surrounding tensor blocks.

Description of the ETC Gantry Data Completion Issue
It is assumed that there exists a higher-order tensor block, X ∈ R I 1 ×I 2 ×I 3 , with missing values and a set of observable data within X , which is Φ ∈ x ijk (i, j, k) ∈ Φ, the elemental values outside the observable dataset should be calculated as rapidly as possible while ensuring accuracy.Therefore, the observable dataset within X is projected onto each spatial dimension of the tensor block as follows: if (i, j, k) ∈ Φ, then the observed value of element P Φ (x) is x ijk ; otherwise, the elemental value is zero.
where P Φ is the projection operator.

Construction of the Objective Function
The method adopted for ETC gantry data completion, rooted in Tucker decomposition, addresses the associated optimization problem.
f G, U (1) , U (2) , To minimize the function, the core tensor G and factor matrices U (1) , U (2) , and U (3) are the factor matrices to be completed.The current function can be minimized by obtaining the core tensor and factor matrix.

1.
Initial Solution: Taking into account the properties of adjacent tensor blocks, the initial solution for the objective function is estimated using the tensor core and factor matrices, represented as 0 , and Optimization of Solutions: The solution is refined through iterative computation of the approximate gradients for each variable.This iterative process ensures the optimal update of the objective function, guided by the following update rule: In the above equations, , U (i) are the gradients of f G, U (1) , U (2) , U (3) to G and U (i) , respectively, and L G and L U (i) are the Lipschitz Directly applying abnormal ETC gantry plate recognition data in the computation of traffic parameters can severely disrupt model results and analyses.The preprocessing of abnormal plate recognition data in this section includes the identification of ETC gantry abnormal plate recognition data based on classification concepts, preliminary repair of abnormal plate recognition data, and evaluation of the integrity of ETC gantry plate recognition data.
Initially, by establishing a normal plate recognition data model using standards and norms, one can predetermine the characteristics of normal data within the data table.This includes correct data formats/rules and distribution models for each attribute.The ETC gantry plate recognition data table is then checked against these standards, including data format, degree of fit within distribution models, repetition of vehicle passage information through each gantry, and vehicle travel direction.Differences between the tested data and the normal data characteristics are used to identify anomalies.
In the second step, the prematch of abnormal ETC gantry plate recognition data involves the correction of erroneous, duplicate, and missing data through deletion, expansion, completion, and transformation, thus enhancing the data usability.As indicated above, different processing methods can be selected based on the cause and form of the anomalies.For incorrect plate recognition data, identified errors are deleted from the standard data table and then reclassified as missing data, moving into the repair process for missing plate recognition data.The issue of missing data is resolved using automated completion and repair methods based on auxiliary analysis data tables, ETC gantry toll transaction tables, and ETC gantry abnormal event logs, which are manually processed, to ensure the accuracy of the data and assist in the completion of missing ETC gantry plate recognition data.For instance, if part of the data for a vehicle are missing in the gantry plate recognition data table, one can use the complete information available for that vehicle within the table to search the entire vehicle passage record in the auxiliary analysis tables, utilizing the correlation of the same vehicle's short-term operational status to extract usable data for completion.In cases where multiple identical entries appear in the ETC gantry plate recognition data table, the uniqueness of the billing transaction identification number is used as a standard to retain one instance of the duplicate plate recognition data and discard the rest.
Finally, this paper quantitatively describes errors, omissions, and duplications in plate recognition data information and evaluates the quality of ETC gantry plate recognition data.A commonly used data quality evaluation index formula is as follows: Within the formula, NU M valid represents the amount of valid data in the original ETC gantry plate recognition data table that is nonredundant and complete.NU M all is the theoretical total amount of data that should be in the ETC gantry plate recognition data table.TD I N is the evaluation index for the quality of the ETC gantry plate recognition data.
Issues such as network transmission delays can lead to delayed uploads of plate recognition data.The ETC gantry system tolerates delays in data upload to a certain extent, but uploads exceeding a specific time range can affect highway toll transactions.Therefore, this section sets t as the acceptable time range.When the data upload time is within this range, it is marked as a normal upload; otherwise, it is considered delayed.The data evaluation index formula is adjusted as follows: Here, NU M delay represents the volume of delayed uploaded data.NU M duplicate is the volume of duplicate data in the ETC gantry plate recognition data table.NU M obtain is the actual total volume of data obtained by the ETC gantry.After identifying and initially repairing the abnormal data, the quality of the ETC gantry plate recognition data is evaluated.The data quality of the G50 Nanjing section ETC gantry for January is illustrated in Figure 7.
Sensors 2024, 24, 86 17 of 23 data table. is the evaluation index for the quality of the ETC gantry plate recognition data.
Issues such as network transmission delays can lead to delayed uploads of plate recognition data.The ETC gantry system tolerates delays in data upload to a certain extent, but uploads exceeding a specific time range can affect highway toll transactions.Therefore, this section sets  as the acceptable time range.When the data upload time is within this range, it is marked as a normal upload; otherwise, it is considered delayed.The data evaluation index formula is adjusted as follows: Here,  represents the volume of delayed uploaded data. is the volume of duplicate data in the ETC gantry plate recognition data table. is the actual total volume of data obtained by the ETC gantry.After identifying and initially repairing the abnormal data, the quality of the ETC gantry plate recognition data is evaluated.The data quality of the G50 Nanjing section ETC gantry for January is illustrated in Figure 7.The quality of the ETC gantry data was generally high, with the monthly quality level exceeding 85% and an average data quality level of 90.4% for the month.There were 11 days in the month when the ETC gantry data quality fell below 90% (referred to as poor data quality), which is already superior to the detection effects of most detectors on highways.Utilizing meteorological data from the ETC gantry system's built-in weather detection equipment and referencing weather data, an investigation was conducted on the meteorological conditions in January 2022 in the area of the G50 Nanjing section (Suzhou).The region experienced minimal variation in January temperatures, with an average temperature of 7.93 °C, including an outlier of 1.4 °C.The region's precipitation for the month was 74.8 mm over 9 days of rainfall, with the specific weather conditions shown in Figure 8 below.The quality of the ETC gantry data was generally high, with the monthly quality level exceeding 85% and an average data quality level of 90.4% for the month.There were 11 days in the month when the ETC gantry data quality fell below 90% (referred to as poor data quality), which is already superior to the detection effects of most detectors on highways.Utilizing meteorological data from the ETC gantry system's built-in weather detection equipment and referencing weather data, an investigation was conducted on the meteorological conditions in January 2022 in the area of the G50 Nanjing section (Suzhou).The region experienced minimal variation in January temperatures, with an average temperature of 7.93 • C, including an outlier of 1.4 • C. The region's precipitation for the month was 74.8 mm over 9 days of rainfall, with the specific weather conditions shown in Figure 8 below.Given the minor fluctuations in regional temperatures and its location in the south, where there are no cold temperatures, it can be inferred that temperature has a minimal impact on the ETC gantry's detection ability.This paper primarily analyzed the quality of ETC gantry data under poor visibility conditions, such as overcast and rainy days.By comparing the meteorological conditions on dates when the ETC gantry data quality was above 90% with those below 90%, it was found that the proportion of rainy days with poor quality was 77.78% of the total number of rainy days and 63.64% of the total number of days with poor quality.This suggests that the quality of the ETC gantry data is affected by adverse weather conditions such as rainfall.
The accuracy and reliability of the ETC plate recognition data directly impact the operation of highway toll collection.An analysis of the actual usage of the G50 Nanjing section's ETC gantry data reveals the following advantages: (1) Quick updates, with a large volume of data accumulating over a short period of time.Millions of vehicle passage data can be generated daily.Moreover, the data have high consistency, strong structure, and low update costs.(2) Rich dimensions of ETC gantry data provide multiple perspectives for traffic data analysis.The ETC gantry system includes static data, such as vehicle information, toll station information, pass card information, and toll system information, as well as dynamic data, like travel information, charging information, and system verification information.
The collection, transmission, and storage of plate recognition data require a multitude of devices and network services, which introduces the possibility of missed captures and identification errors: (1) Evasion tactics such as covering license plates, along with unstable equipment conditions, can result in missing fields and data anomalies within ETC gantry data, making it impossible to guarantee data quality.Concurrently, improper use by drivers, such as excessive speed, loss of ETC electronic tags, the application of special windshield films, and small vehicles being obscured by larger vehicles in tow, can lead to detection failures at the gantry, impacting data accuracy and leading to issues such as transaction anomalies, nonrecognition of vehicles, and missing vehicle statistics in the G50 Nanjing section ETC gantry.(2) The ETC gantry span in the G50 Nanjing section reaches up to 9 km, with insufficient monitoring of traffic flow between gantries.Additionally, the uneven and sparse distribution of ETC gantry detectors may lead to biases in the description of vehicle behavior.For instance, a vehicle with a license plate ending in 375 recorded passage times at two gantries, G005032001000310010 and G005032001000410010, approximately 2 min apart.With the gantries about 6 km apart, this would calculate to a speed of 180 km/h, which is inconsistent with road traffic rules.Given the minor fluctuations in regional temperatures and its location in the south, where there are no cold temperatures, it can be inferred that temperature has a minimal impact on the ETC gantry's detection ability.This paper primarily analyzed the quality of ETC gantry data under poor visibility conditions, such as overcast and rainy days.By comparing the meteorological conditions on dates when the ETC gantry data quality was above 90% with those below 90%, it was found that the proportion of rainy days with poor quality was 77.78% of the total number of rainy days and 63.64% of the total number of days with poor quality.This suggests that the quality of the ETC gantry data is affected by adverse weather conditions such as rainfall.
The accuracy and reliability of the ETC plate recognition data directly impact the operation of highway toll collection.An analysis of the actual usage of the G50 Nanjing section's ETC gantry data reveals the following advantages: (1) Quick updates, with a large volume of data accumulating over a short period of time.Millions of vehicle passage data can be generated daily.Moreover, the data have high consistency, strong structure, and low update costs.(2) Rich dimensions of ETC gantry data provide multiple perspectives for traffic data analysis.The ETC gantry system includes static data, such as vehicle information, toll station information, pass card information, and toll system information, as well as dynamic data, like travel information, charging information, and system verification information.
The collection, transmission, and storage of plate recognition data require a multitude of devices and network services, which introduces the possibility of missed captures and identification errors: (1) Evasion tactics such as covering license plates, along with unstable equipment conditions, can result in missing fields and data anomalies within ETC gantry data, making it impossible to guarantee data quality.Concurrently, improper use by drivers, such as excessive speed, loss of ETC electronic tags, the application of special windshield films, and small vehicles being obscured by larger vehicles in tow, can lead to detection failures at the gantry, impacting data accuracy and leading to issues such as transaction anomalies, nonrecognition of vehicles, and missing vehicle statistics in the G50 Nanjing section ETC gantry.(2) The ETC gantry span in the G50 Nanjing section reaches up to 9 km, with insufficient monitoring of traffic flow between gantries.Additionally, the uneven and sparse distribution of ETC gantry detectors may lead to biases in the description of vehicle behavior.For instance, a vehicle with a license plate ending in 375 recorded passage times at two gantries, G005032001000310010 and G005032001000410010, approximately 2 min apart.With the gantries about 6 km apart, this would calculate to a speed of 180 km/h, which is inconsistent with road traffic rules.(3) ETC gantry video detectors are used for vehicle plate data collection.These video detectors rely on optical principles, and factors such as dust, shadows, weather, and lighting can affect the detection results; hence, natural elements like rainy weather and nighttime conditions impact the accuracy of gantry detection.(4) The minimum distance between gantries in the same direction on the G50 Nanjing section is set at 400 m, with two ETC gantries spaced 500 m apart, and three road segments have relatively small distances between them.Figure 9   For tensor model construction, PyTorch 1.11 was the framework of choice.Th tensor block models were formulated using data fields from each ETC gantry with G50 Nanjing section, encompassing aspects such as token recognition, transactions dational information, and atypical state indicators.These subtensor components we thodically categorized per dimension into basic and expandable attributes.The exp ble attributes were then amalgamated and preserved based on the foundational att of each dimension to establish a three-dimensional, high-order tensor block model ETC gantry system.The method's efficacy was gauged by its accuracy and computational complex contrasted this approach with a matrix decomposition-based completion model TRIX) and a prediction model founded on Long Short-Term Memory (LSTM) neur works.The MATRIX model relies on the low-rank nature of matrices and known ele to decompose the target matrix into two lower-rank matrices, thereby reconstructi missing entries [33].Conversely, the LSTM-based model capitalizes on the predict pabilities of neural networks for ETC gantry data restoration.Preliminary proces the ETC gantry data revealed notable instances of missing information for January, included discrepancies arising from incorrect data removal and delays in data tra sion.These instances are encapsulated in Table 6.For tensor model construction, PyTorch 1.11 was the framework of choice.The subtensor block models were formulated using data fields from each ETC gantry within the G50 Nanjing section, encompassing aspects such as token recognition, transactions, foundational information, and atypical state indicators.These subtensor components were methodically categorized per dimension into basic and expandable attributes.The expandable attributes were then amalgamated and preserved based on the foundational attributes of each dimension to establish a three-dimensional, high-order tensor block model for the ETC gantry system.The method's efficacy was gauged by its accuracy and computational complexity.We contrasted this approach with a matrix decomposition-based completion model (MATRIX) and a prediction model founded on Long Short-Term Memory (LSTM) neural networks.The MATRIX model relies on the low-rank nature of matrices and known elements to decompose the target matrix into two lower-rank matrices, thereby reconstructing the missing entries [33].Conversely, the LSTM-based model capitalizes on the predictive capabilities of neural networks for ETC gantry data restoration.Preliminary processing of the ETC gantry data revealed notable instances of missing information for January, which included discrepancies arising from incorrect data removal and delays in data transmission.These instances are encapsulated in Table 6.In each missing data situation, two days of missing data were selected as the original data input, and the average time headway, h t i,j ; traffic volume, q i ; and mean interval speed, v s (i,i+n) were the traffic data to be completed.After the data are calibrated using the transaction data sheet, the data can be considered accurate and set to the true value.The Root Mean Square Error (RMSE) was selected as the evaluation index for this data supplementation, and the equation is as follows: where xi is the value of complementary traffic data, x i is the true value of traffic data, and n is the number of complementary traffic data points.

Accuracy Evaluation
The traffic data completion error values for the MATRIX, LSTM, and IDTD models under various missing data scenarios are presented in Table 7, with a comparative analysis depicted in Figure 10.In each missing data situation, two days of missing data were selected as the original data input, and the average time headway, ℎ , ; traffic volume,  ; and mean interval speed,  ( ,   ) were the traffic data to be completed.After the data are calibrated using the transaction data sheet, the data can be considered accurate and set to the true value.The Root Mean Square Error (RMSE) was selected as the evaluation index for this data supplementation, and the equation is as follows: where  is the value of complementary traffic data,  is the true value of traffic data, and  is the number of complementary traffic data points.

Accuracy Evaluation
The traffic data completion error values for the MATRIX, LSTM, and IDTD models under various missing data scenarios are presented in Table 7, with a comparative analysis depicted in Figure 10.On the basis of the above results, it can be seen that IDTD outperformed the other two algorithms for all types of traffic data completion under different missing data situations: a.All three algorithms increased the completion error with an increase in the missing rate, among which MATRIX had the greatest effect on the completion with the missing rate, and IDTD has the least effect with the missing rate and could better adapt to severe missing data; b.Among the selected traffic data, the time headway completion error was the largest, whereas the traffic volume and interval average speed completion errors were similar.The time headway was influenced by the driving environment and independent On the basis of the above results, it can be seen that IDTD outperformed the other two algorithms for all types of traffic data completion under different missing data situations: a.
All three algorithms increased the completion error with an increase in the missing rate, among which MATRIX had the greatest effect on the completion with the missing rate, and IDTD has the least effect with the missing rate and could better adapt to severe missing data; b.Among the selected traffic data, the time headway completion error was the largest, whereas the traffic volume and interval average speed completion errors were similar.The time headway was influenced by the driving environment and independent choices of the driver.It was less effective for the low-dimensional MATRIX and only useful for the time-series LSTM.

Evaluation of the Computational Complexity
The evaluation of the computational complexity in this study focused on the static tensor calculation model used for real-time update passage records into the historical data and global data calculations.For our ETC gantry tensor model, the number of gantries is denoted as m, the number of vehicles recorded historically is n, the number of updated vehicles is ∆n, and the number of core tensor and factor matrix iterations in the static high-order ETC gantry tensor model is o m 2 n + ∆n) 2 .The daily traffic volume on the G50 Nanjing highway is more than 20,000 vehicles; therefore, the calculation under the static tensor model is extremely large.If D (i) is the number of features of each point in the ith-dimensional space in the improved dynamic tensor decomposition model for ETC gantries, the high sparsity of the ETC gantry data leads to D (i) ≪ ∏ n i .Thus, the computational complexity of the factor matrix increment in each dimension in the conventional calculation is o n (i) D (i) , whereas the computational complexity of the factor matrix increment in the approximate tensor decomposition is only o D (i) .The computational time complexity for updating the core tensor corresponding to the static high-order tensor decomposition and IDTD are o T∑ 3 i=1 D (i) n (i) ) 2 + n (i) ) 3  and o T∑ 3 i=1 r (i) n (i) + 1 D (i) , respectively, which show that the computational complexity is significantly reduced.

Discussion
This paper's analysis indicates that the abnormal phenomenon of the ETC gantry data is not only accidental but also indicates systematic problems from technical failures to deliberate avoidance strategies.We identified and corrected data anomalies based on the completion model.The initial repair strategy, including deletion, expansion, completion, and transformation, greatly improved the availability of data, ensuring that the subsequent analysis was based on as accurate data as possible.
The impact of weather and other environmental conditions on data quality cannot be ignored.The research results of G50 indicate that adverse weather conditions, especially rainfall, can have a negative impact on the quality of ETC gantry data.This is a key area of future research, as developing more powerful systems that can maintain high data quality regardless of environmental conditions is crucial for the reliability of traffic data analysis.
The IDTD model was used for tensor decomposition and compared with the MATRIX and LSTM models, demonstrating the potential of advanced computing techniques to address the challenges of data sparsity and complexity.The excellent performance of the IDTD model in accuracy and reducing computational load indicates its potential applicability in a range of scenarios where large-scale sparse datasets are prevalent.This study lays the foundation for future research.It is necessary to further improve data preprocessing techniques, develop more flexible data collection systems for environmental factors, and explore the scalability of IDTD models.

Conclusions
This paper introduced a high-order tensor model tailored for ETC gantry data that adeptly captures temporal and spatial structural information.Through the utilization of an Improved Dynamic Tensor Decomposition (IDTD) approach, we successfully addressed the challenges posed by the considerable sparsity of ETC gantry data.By employing a Laplacian matrix within the IDTD framework, we effectively mitigated computational burdens and preserved essential traffic characteristics in the data completion process.Our comparative analysis, illustrated through a case study, confirms the superior accuracy and reduced computational complexity of the IDTD-based algorithm over traditional methods.The IDTD method, therefore, holds substantial promise for comprehensive ETC gantry data completion and has the potential for broader application across diverse datasets and scenarios.Future research will explore extending the reach of IDTD in handling large-scale ETC gantry datasets and its adaptability to other data-intensive domains.

Figure 1 .
Figure 1.Traffic information extraction process for ETC gantry data.

Figure 1 .
Figure 1.Traffic information extraction process for ETC gantry data.

Figure 2 .
Figure 2. Diagram of the three-dimensional tensor model of ETC gantry data.

Figure 2
Figure 2 visually represents the three-dimensional tensor model for ETC gantry data.In this model, the x-axis represents time, segmented into units of 4 s each, capturing the temporal aspect of traffic flow.The y-axis corresponds to the gantry ID, delineating the specific location of each gantry along the highway.The z-axis indicates the presence or absence of vehicles in different lanes at any given time.Distinct colors represent different lanes for ease of interpretation: orange for the first lane, yellow for the second, and green

Figure 2 .
Figure 2. Diagram of the three-dimensional tensor model of ETC gantry data.

Figure 2
Figure 2 visually represents the three-dimensional tensor model for ETC gantry data.In this model, the x-axis represents time, segmented into units of 4 s each, capturing the temporal aspect of traffic flow.The y-axis corresponds to the gantry ID, delineating the specific location of each gantry along the highway.The z-axis indicates the presence or absence of vehicles in different lanes at any given time.Distinct colors represent different lanes for ease of interpretation: orange for the first lane, yellow for the second, and green for the third.This color-coding aids in visualizing the distribution and movement of traffic across various lanes.The diagram exemplifies the model's capacity to encapsulate detailed information about vehicle flow, including temporal and spatial dynamics.However, it also = [ ,  ,  ,  ],  = [ ,  ,  ],  = [ ,  ] Equation (3) delineates the method for merging subtensor blocks.In this con  × × and  ∈  × represent tensor blocks with different dimensions, w  × × has tensor dimensions  = [ ,  ,  ],  = [ ,  ,  ], and  = [ ,  ] , and has tensor dimensions  = [ ,  ,  ] and  = [ ,  ] . ∈  × × and  ∈  × a imposed to form tensor  ∈  × × with dimensions  = [ ,  , [ ,  ,  ], and  = [ ,  ].

Figure 3 .
Figure 3. Diagrammatic representations of traffic data calculations based on the high-order tensor model: (a) layout of high-order tensor model; (b) dissection of the high-order tensor model to analyze traffic data (c) process of extracting specific vehicle information within the high-order tensor mode for traffic.

Figure 4 .
Figure 4. Diagrams of the dynamic tensor flow decomposition: (a) tensor window diagram; (b) diagram of the tensor flow decomposition.

Figure 4 .
Figure 4. Diagrams of the dynamic tensor flow decomposition: (a) tensor window diagram; (b) diagram of the tensor flow decomposition.

Figure 5 .
Figure 5. Dynamic tensor processing of ETC gantry traffic data.

Figure 5 .
Figure 5. Dynamic tensor processing of ETC gantry traffic data.

Figure 6 .
Figure 6.Maximum and minimum tensor window values based on varying criteria.

Figure 6 .
Figure 6.Maximum and minimum tensor window values based on varying criteria.

(m) 1 ∈
R n m ×r m comprises the leading r m eigenvectors of the covariance matrix, C m 1 M m=1

Figure 7 .
Figure 7.The data quality of the G50 Nanjing section's ETC gantry for January.

Figure 7 .
Figure 7.The data quality of the G50 Nanjing section's ETC gantry for January.

Figure 8 .
Figure 8. Meteorological conditions in the Suzhou Area of the G50 Nanjing Section.
below shows the proportion of duplicate vehicle detection data for January among ETC gantry segments, with blue, red, and green representing the three closely spaced segments, and grey representing other segments.Duplicate detection issues are more significant in the three closely spaced segments.Sensors 2024, 24, 86 1 detectors rely on optical principles, and factors such as dust, shadows, weather, and ing can affect the detection results; hence, natural elements like rainy weath nighttime conditions impact the accuracy of gantry detection.(4) The minimum d between gantries in the same direction on the G50 Nanjing section is set at 400 m two ETC gantries spaced 500 m apart, and three road segments have relatively sm tances between them.Figure 9 below shows the proportion of duplicate vehicle de data for January among ETC gantry segments, with blue, red, and green represent three closely spaced segments, and grey representing other segments.Duplicate de issues are more significant in the three closely spaced segments.

Figure 9 .
Figure 9. Proportion of duplicate detection data within ETC gantry segments.

Figure 9 .
Figure 9. Proportion of duplicate detection data within ETC gantry segments.

Figure 10 .
Figure 10.Traffic data completion comparison under different missing data scenarios.

Figure 10 .
Figure 10.Traffic data completion comparison under different missing data scenarios.

Table 1 .
Numerical conversion factors for vehicle types.

Table 1 .
Numerical conversion factors for vehicle types.

Table 5 .
Causes and forms of anomalies in ETC gantry plate recognition data.

Table 6 .
Overview of missing ETC gantry data in January.

Table 6 .
Overview of missing ETC gantry data in January.

Table 7 .
RMSE values for traffic data completion.

Table 7 .
RMSE values for traffic data completion.