Semantic Multigranularity Feature Learning for High-Resolution Remote Sensing Image Scene Classiﬁcation

: High-resolution remote sensing image scene classiﬁcation is a challenging visual task due to the large intravariance and small intervariance between the categories. To accurately recognize the scene categories, it is essential to learn discriminative features from both global and local critical regions. Recent efforts focus on how to encourage the network to learn multigranularity features with the destruction of the spatial information on the input image at different scales, which leads to meaningless edges that are harmful to training. In this study, we propose a novel method named Semantic Multigranularity Feature Learning Network (SMGFL-Net) for remote sensing image scene classiﬁcation. The core idea is to learn both global and multigranularity local features from rearranged intermediate feature maps, thus, eliminating the meaningless edges. These features are then fused for the ﬁnal prediction. Our proposed framework is compared with a collection of state-of-the-art (SOTA) methods on two ﬁne-grained remote sensing image scene datasets, including the NWPU-RESISC45 and Aerial Image Datasets (AID). We justify several design choices, including the branch granularities, fusion strategies, pooling operations, and necessity of feature map rearrangement through a comparative study. Moreover, the overall performance results show that SMGFL-Net consistently outperforms other peer methods in classiﬁcation accuracy, and the superiority is more apparent with less training data, demonstrating the efﬁcacy of feature learning of our approach.


Introduction
Remote sensing (RS) refers to the practice of observing, recording, measuring, and deriving information about the Earth's land and water surfaces using images acquired from an overhead perspective [1]. With the development of RS technology in the past decades, tremendous high-resolution RS images are available. Meanwhile, corresponding research efforts towards intelligent understanding, identification, and classification of RS scene content have been actively investigated because the quality of scene interpretation determines the effects of numerous downstream applications, such as urban planning, traffic control, and land resource management. Specifically, the goal of RS image scene classification is to assign a semantic label to an RS image patch. The task is complex and challenging due to interclass similarities and variable shapes of ground objects. For example, Figure 1 shows some image samples in the NWPU-RESISC45 dataset. It is observed that the scene samples of Medium and Dense Residential are similar; further, for the four Palace samples, there is a semantic gap in the color, size, shape, and edge distributions. To better recognize these scene objects, both global and local features are crucial. For the samples in Figure 1, global statistical features help distinguish Medium Residential and Dense Residential, and local features are essential for recognizing the Palace. Early representative features, such as Scale-Invariant Feature Transform (SIFT), Gabor filters, and the histogram of oriented gradients (HoG), have been explored for RS image classification. Methods relying on these handcrafted low-level features only perform well on images with uniform spatial arrangements or texture but they are limited to distinguishing RS images with more complex scenes. The rise of deep learning and related hardware advancement has revolutionized every industrial sector. Powered by convolutional neural network (CNN) [2][3][4][5][6], traditional computer vision tasks like image classification and object detection have had tremendous and rapid performance gains in the past ten years. CNNs have demonstrated outstanding capability in the discovery of intricate structures and discriminative information hidden in high-dimensional data, making it suitable for image data. Moreover, CNN models pretrained on large and opendomain datasets, such as ImageNet [7], can be transferred to any domain-specific datasets, only costing additional efforts in fine-tuning. However, the traditional CNN architectures, represented by AlexNet [2], VGG [3], Inception [5,6], and ResNet [4], are still limited in RS image scene classification due to the task-specific challenges mentioned above.
To address these challenges, recent CNN-based RS image scene classification methods mainly focus on how to extract discriminative features. For a fine-grained classification dataset such as NWPU-RESISC45, it would be effective to apply fine-grained visual categorization (FGVC) methods that aim to learn discriminative features either through localization of critical regions [8][9][10][11][12] or via end-to-end feature encoding from the whole input image [13][14][15][16]. Besides, there are other methods focusing on image data preprocessing [17,18]. Most of the prior efforts have achieved impressive performance on multiple FGVC datasets, such as CUB-Birds [19], Stanford Dogs, and Stanford Cars [20]. However, as shown in our experiments, these FGVC models do not achieve satisfying results in RS image scene classification. A promising direction to further boost the classification accuracy is multigranularity discriminative feature learning, which aims to discover features at multiple scales along the convolutional network backbone. Prior efforts [21][22][23] have explored ways to mine multigranularity features that are fused and then fed into the detection head. In [22], a multigranularity progressive training framework is proposed to learn complementary features at different granularities. For each input image, a jigsaw puzzle generator [24] is adopted to partition the image into multiple patches, which are then randomly rearranged to form a reconstructed image. Training with multiple granularities could encourage the network to operate on a patch-level, where patch sizes are specific to a particular granularity. However, the random rearrangement on the patches of the input image could introduce noise, which is harmful to training. As shown in Figure 2, the random rearrangement operation has destroyed the local construction in the image, creating meaningless edges that prevent the model from learning low-level features. In this paper, we propose a novel model named the Semantic Multigranularity Feature Learning Network (SMGFL-Net) for RS image scene classification. SMGFL-Net is an endto-end framework that learns multigranularity features and fuses them for final recognition. Instead of rearranging the input image that causes local shape destruction, SMGFL-Net adopts a patch-level rearrangement on the high-level feature maps. We conduct extensive experiments to compare SMGFL-Net with a wide range of peer methods on the NWPU-RESISC45 [25] and on the Aerial Scene Classification (AID) [26] datasets. Results show that SMGFL-Net can effectively learn and leverage global and local features at various granularities, thus, achieving state-of-the-art (SOTA) performance.

Fine-Grained Object Recognition
Prior FGVC methods have focused on the learning of discriminative features either from critical regions or the whole image. Yang et al. proposed the Navigator-Teacher-Scrutinizer Network (NTS-Net) [9], a multiagent cooperative learning framework, to identify critical regions. A localization subnetwork is adopted to compute the informativeness of subregions, and activations in the subnetwork correspond to subregions with different sizes and aspect ratios. The informative regions could then be selected through a custom loss function. Sun et al. [27] proposed a one-squeeze multiexcitation module to learn multiple attentive region features, which are then fed into a metric-learning framework with multiattention, multiclass constraints. Discriminative features can also be learned from the whole image via an end-to-end feature encoder. Lin et al. proposed bilinear feature transformation [28], which allows CNNs to learn fine-grained details over a global image by calculating pairwise interactions between feature channels. Follow-up efforts, including compact bilinear [14] and low-rank bilinear pooling [29], aimed to improve the computational efficiency caused by the exponential growth of feature dimensions appearing in bilinear-CNN. Different from these FGVC methods, the proposed method focuses on learning features at multiple granularities to discover the global and local semantic meaning of the scenes in the RS image.

Multigranularity Feature Learning
A recently active line of research in FGVC is multigranularity feature learning [21][22][23]. In [21], a gradually enhanced strategy was introduced to learn multiple granularity-specific experts on limited fine-grained training data. A progressive multigranularity training framework was proposed by [22] to learn complementary features from different granularities, with the central idea of encouraging the network to learn multigranularity features from the whole image as well as the patches obtained and rearranged by a jigsaw puzzle generator. However, the meaningless edges caused by the image patches are harmful to effective feature learning. Thus, this study aims to address this problem by moving the patch generation from the original image to the intermediate feature maps.

Remote Sensing Image Scene Classification Methods
CNN-based methods have achieved impressive improvement for RS image scene classification. In [30], a feature-selection method was proposed based on the deep belief network (DBN). Due to the effectiveness in feature abstraction, this method works well on a relatively small dataset. [31] proposed an adaptive deep pyramid matching (ADPM) model that took advantage of information from all convolutional layers in the network. ADPM achieved superior performance on the 21-Class Land Use dataset [32] and the 19-Class Satellite Scene dataset [33]. Multiple convolutional layers were also used in [34] to compute a covariance matrix. Each entry of covariance matrix stands for the covariance of two feature maps; this could exploit the complementary information from different convolutional layers. As a complete novel network, the capsule network is applied in RS image scene classification. In [35], the capsule network was added to a CNN without fully connected layer. In [36], a Multigranularity Multilevel Feature Fusion Branch (MGML-FFB) was proposed to extract multigranularity features by a custom module named the channelseparate feature generator (CS-FG). Further, diversified predictions were provided by an ensemble module integrated in the network. Ke et al. conducted a series of studies in RS Image Scene Classification. In [37], a multilayer feature fusion network was developed with a data augmentation approach integrated into training to improve the generalization ability of the model. In [38], the authors proposed a global-local dual-branch structure (GLDBS) that allows a network to explore global and local discriminative features. In addition to CNN, Graph Convolutional Network (GCN)-based methods have also been explored. The paper [39] discussed a deep feature aggregation framework driven by graph convolutional network (DFAGCN) for high-spatial-resolution scene classification. The framework utilizes a pretrained CNN to obtain multilayer features, which are fed into a GCN to obtain patch-to-patch correlations between the feature maps. The output is then passed through a weighted concatenation operation and a linear layer to make a prediction. Our investigation shows that the method involving feature map rearrangement has not been extensively studied.
This method is tested on larger datasets, such as NWPU-RESISC45 [25]. Some of the mentioned efforts are used as baselines of this study.

Semantic Multigranularity Feature Learning Network
This section presents the technical details of the proposed SMGFL-Net, an end-to-end deep network for RS image scene classification. Figure 3 shows the neural architecture of SMGFL-Net. At a high level, SMGFL-Net employs a CNN-based backbone that consists of multiple stages, where feature maps in a stage are of the same size. In the training phase, feature maps at different stages are partitioned into patches and rearranged, and then fed into subsequent convolution and classification blocks. This way, the network is encouraged to learn features at multiple granularities. In the inference phase, however, the intermediate feature maps are not reconstructed since the network has been optimized to learn global and local features during training.

Network Design
Let F be the backbone feature extractor with L stages, which outputs feature maps of different sizes. Let F i ∈ R C i ×H i ×W i be the feature map generated at stage S i , where i = 1, 2, . . . , L, and C i , H i , and W i are the number of feature channels, and height and width of F i , respectively. In the training phase, an intermediate feature map F i is partitioned into multiple patches by a jigsaw puzzle generator and randomly rearranged to form a reconstructed feature map, denoted by F i . In addition to the backbone network, we introduce a convolutional block B conv and a classification block B cls process for each granularity-specific branch. B conv is a two-step operation, as shown in Figure 3c. The first step is a 1 × 1 convolutional layer followed by a batch normalization operation (BN) and a Rectified Linear Unit (ReLU) activation (Conv 1×1 + BN + ReLU), and the second step is a Conv 3×3 + BN + ReLU layer. The output of a convolutional block is then fed into a pooling block, which could be either max or average pooling. Lastly, the pooling results are sent to a classification block B cls , which consists of two fully connected layers with BN and nonlinear operations, such as Exponential Linear Units (ELU) [40], to predict a probability distribution over the classes. Lastly, to fuse the intermediate feature maps, we define where [; ] refers to the fusion operator. We send V f usion through another classification head to obtain a prediction resultŷ f usion . In the experiments, different fusion strategiesincluding concatenation, summation, and multiplication-are evaluated and compared with each other, and the optimal one is selected.

Training
In the training phase, a jigsaw puzzle generator is utilized to partition and rearrange the intermediate feature maps. This way, the meaningless edges can be eliminated. When the activation value in the feature map changes, the localization of its receptive field also changes. Further, as the network deepens, the feature map in later layers contains more semantic information and less spatial information.
Given a feature map . The patches are then randomly shuffled and merged together into a new feature map F i . Further, H i and W i should be integral multiples of n i , and the granularity of the convolutional layer at stage S i is given by H i n i . Each F i is subsequently passed through a convolutional block B conv , a pooling block, and a classification block B cls , which outputs a prediction resultŷ i for the branch belonging to the ith stage. Moreover, we letŷ 0 be the prediction result of the trunk network (i.e., the backbone) and let y be the ground truth label of an input image; the cost C per image can be defined as the sum of the pairwise cross entropy loss between prediction branch and the ground truth, given in Equation (2).
The overall loss function is a summation of the costs across all images in the training set.

Inference
In the inference phase, an input image is passed through the network in a feed-forward way without the step of feature map rearrangement, because the network has been trained and the parameters have been optimized to discover features at multiple granularities. We consider two types of predictions in this paper. In the first case, only the fused result, defined in Equation (3), is used for the final prediction. Sinceŷ f usion is a normalized vector, the element with the highest confidence is selected and returned as the predicted category, which is performed by the arg max operator. This can lead to less computational budget. In the second case, the fused result and results given by each granularity branch are combined together for the final prediction, defined in Equation (4). Due to the complementary nature of different granularity branches, the combined prediction can achieve better performance than the first case. This hypothesis has been proven in the experiments.

Dataset
In the experiments, the proposed SMGFL-Net and baselines are evaluated on the NWPU-RESISC45 [25] and AID datasets.  Figure 4, we display two scene samples per category for all categories. In these categories, there are a number of fine-grained scenes with small intervariance and large intravariance. Additionally, both global and local features are necessary for the recognition of certain categories. For example, as shown in Figure 1, images from Palace and Medium residential have similar global shapes, and there is a big difference between images in Palace category. The Medium residential and Density residential samples have local similarities, but differ in global characteristics such as spatial layout. • As a dataset for aerial scene classification, AID is made up of the following 30 classes: airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks, and viaduct. The number of sample images varies with different classes-from 220 up to 420. In all, there are 10,000 images within 30 classes in AID. Images in AID are from different remote imaging sensors. Moreover, all the sample images per class are carefully chosen from different regions around the world. AID is a challenging dataset due to the higher intraclass variations and smaller interclass dissimilarity. The pixel resolution in AID changes from about 8 m to about half a meter, and the size of each image in AID is fixed to be 600 × 600.

Baselines
Baselines in our experiments could be divided into three categories: the methods that extract global features, the methods that extract local features, and the FGVC methods that localize critical regions and fuse global and local features. Some dedicated efforts on RS image scene classification are also taken into account for comparison. These methods are briefly introduced in this subsection.
• Global methods. The deep CNNs designed for generic image classification such as VGG [3] and ResNet [4] are used as global methods in our experiments. These networks have achieved SOTA performance on large-scale datasets, such as ImageNet [7]. • Local methods. Different from the designed networks for generic object recognition, local CNN-based methods such as bilinear-CNN [16] capture orderless statistical information without spatial information. • FGVC methods. In the experiments, we select several FGVC methods related to the design idea of the proposed SMGFL-Net. In NTS-Net [9], critical regions with different sizes and aspect ratio are automatically selected through the region proposal network.
It could fuse local and global features for recognition. ResNet50 is the backbone network of NTS-Net. DCL [17] encourages the backbone network to extract local features by destroying the spatial distribution of the training images. Another highly relevant study, progressive-MG [22], can learn multigranularity features through jigsaw puzzle generator with different patch size in a progressive way of training. • RS image scene classification methods. The related methods in this category include the following: CNN-CapsNet [35] can achieve better affine transformation invariance; InceptionV3-CapsNet [35] achieves the best performance on NWPU-RESISC45 when the training ratios are 0.1 and 0.2; further, multilayer stacked covariance pooling (MSCP) [34] is applied to the VGG16 backbone and consistently outperforms singlelayer models on three challenging datasets. MGML-FENet [36] has achieved SOTA performance on both NWPU-RESISC45 and AID.

Implementation Details
All the experiments are performed using PyTorch over a cluster of GTX TITAN X GPUs. ResNet50 [4] is used as the backbone network of SMGFL-Net. The input images are resized to 512 × 512, then processed through a series of operations such as flip, random cropping to 448 × 448, and normalization at the training step. In the SMGFL-Net backbone, there are five stages of feature maps with different sizes, and we select stages three to five, which is a decision based on empirical results. The sizes of feature maps at stages three, four, and five are 512 × 56 × 56, 1024 × 28 × 28, and 2048 × 14 × 14, respectively. Granularity specific to each stage should be integral multiples of height and width of the feature maps. We use stochastic gradient descent (SGD) with momentum 0.9, weight decay 0.0005, and base learning rate 0.0002 for the pretrained parameters and 0.002 for the new parameters of convolutional and classification block. The learning rates of different parameters are reduced by following the cosine annealing schedule during training. The batch size is set to be 32.

Performance Metric
Since the dataset is well-balanced, we use accuracy (ACC) as the sole performance metric, which is also widely adopted in the literature [17,22,35]. Briefly, ACC measures the percentage of correct predictions on the test set. Formally, ACC = TP+FN test set size × 100%, in which TP and FN stand for the number of true positives and false negatives, which are the two cases of correct predictions.

Key Design Choices
We evaluate four design choices in the proposed SMGFL-Net, including (1) the pooling operation that connects the convolutional and classification block, (2) the different granularity options, (3) different fusion strategies of granularity-specific branches, (4) with or without the rearrangement of the feature maps. When evaluating each design choice, the accuracies of the two predictions discussed in Section 3.3 are observed on both NEWPU-RESISC45 and AID. The training ratio is set to be 0.5 in NEWPU-RESISC45 and 0.3 in AID. For the pooling operation, we consider average vs. max pooling. For the granularity options, we consider four granularity combinations applied on stages three, four, and five. Let G S i denote the granularity at stage i in the network. For instance, G S i = 8 means that the patch size in the rearranged feature map at stage i is 8 × 8 during training. The four granularity options we chose are {G S 3 = 28, G S 4 = 14, G S 5 = 7}, {G S 3 = 8, G S 4 = 4, G S 5 = 4}, {G S 3 = 7, G S 4 = 7, G S 5 = 7}, and {G S 3 = 2, G S 4 = 2, G S 5 = 2}. For the fusion strategies of granularity-specific branches, three fusion strategies-concatenation, summation, and multiplication-are evaluated. For evaluating the effectiveness of rearranging the feature maps, we keep the feature maps of each granularity-specific branch unchanged to compare with the situation of rearranging feature maps in the training process. We conducted a randomized search on the combinations of the design options and obtained the best combination based on the search result: { max pooling, {G S 3 = 8, G S 4 = 4, G S 5 = 2}, concatenation, with rearrangement }. To present the effect of a specific design choice, we chose the optimal value for other choices. Results are reported in Tables 1-4. We provide the observations as follows.

•
Pooling strategy. As shown in Figure 3, all of the global and granularity-specific branches contain pooling operations. The average pooling can process all information in a region and the maximum pooling can obtain the maximum value in a region. Table 1 shows an ACC comparison between average and max pooling on both the NWPU-RESISC45 and AID datasets. It is observed that max pooling consistently outperforms average pooling by 0.2-0.9%. Since average pooling considers all pixels in a region, it tends to smooth out an image and sharp features may not be identified. On the other hand, max pooling works by selecting the brightest pixel in a region, which makes it useful when the image background is dark and the semantic features are in lighter pixels. On both datasets, the majority of scene images are in dark background with light semantic objects that distinguish the categories. This characteristic of the dataset justifies the better effect of max pooling. • Branch granularities. Table 2 reflects the effect of the four granularity options, in which {G S 3 = 8, G S 4 = 4, G S 5 = 2} consistently outperform other options on both datasets. From the limited options for the branch granularities that have been tried, we have the impression that a progressively decreasing patch size is favored to accommodate the feature maps that are also decreasing stage-by-stage. However, to validate this point, more granularity options can be tried, or even a heuristic can be developed to automate the search for an optimal or suboptimal set of granularities. We leave this as a meaningful future study, as described in Section 5. • Different fusion strategies. Table 3 shows the effect of three different fusion strategiesconcatenation, summation, and multiplication-in which concatenation consistently outperforms other options in all scenarios. The reason for this result is that the concatenation operation cannot cover the information of the original features and obtain the feature vector with higher dimension. Feature vectors with higher dimension increase the capacity of subsequent classification modules and have stronger feature representation ability. • Feature maps rearrangement. Table 4 reports the effect of feature maps rearrangement. We compare the models with vs. without feature maps rearrangement. A consistent performance gain is observed on both datasets. This is because feature maps rearrangement of different branches can learn subtle fine-grained features of different semantic levels. These features are likely to be complementary to each other to benefit the final scene classification.

Results on NWPU-RESISC45
In the experiments, we compare SMGFL-Net with global, local, FGVC, and RS scene classification methods, which are introduced in Section 4.2. The results are shown in Table 5. The first two columns of the table display the evaluated method and its category, and the last three columns show the accuracy of the model that is trained under three training ratio settings-namely, 0.1, 0.2, and 0.5, which specify the training set ratio over the entire dataset. The observations are summarized as follows: • The proposed SMGFL-Net presents the best overall accuracy compared with all of the benchmarks. Specifically, SMGFL-Net demonstrates the best accuracy of 91.9% and 96.5%, with training ratios of 0.1 and 0.5, respectively. When the training ratio is 0.2, SMGFL-Net has the second-best accuracy of 93.7%, the same as InceptionV3-CapsNet, and is only 0.4% worse than Progressive-MG. For the FGVC methods, Progressive-MG stands out, with the best accuracy when the training ratio is 0.2. From a design point of view, Progressive-MG also employs multigranularity feature extraction but with a key design difference from ours, i.e., their jigsaw puzzle generator is applied on the input image, leading to meaningless edges that may destroy certain features. Results show that our method outperforms Progressive-MG by 0.6%, meaning that the patch rearrangement at the feature map level is more effective for the RS scene classification task. On the other hand, with subnetwork specifically designed for localizing critical regions, NTS-Net still fails to achieve satisfying accuracy. When the training ratio reduces to 0.1, NTS-Net shows the worst accuracy of 84.5%; as the training ratio goes up to 0.5, its accuracy is raised to 93.2%, meaning that NTS-Net needs more training data to close the prediction bias. • Lastly, for the RS methods, VGG16-MSCP does not perform well, with an accuracy less than 90% for training ratios 0.1 and 0.2. InceptionV3-CapsNet, on the other hand, presents the same accuracy as our method with a training ratio of 0.2 but is worse than ours by 2.6% with a training ratio of 0.1. MGML-FENet, as an effective remote sensing image classification method, achieve the best performance when the training ratio is 0.2. However, SMGFL-Net can achieve better performance when the training ratio is 0.1.

Results on AID
In the experiments, we compared SMGFL-Net with the same methods in Section 4.6. The results are shown in Table 6. The first two columns of the table display the evaluated method and its category, and the last three columns show the accuracy of the model that is trained under three training ratio settings-namely, 0.1, 0.2 and 0.3, which specify the training set ratio over the entire dataset. The observations are summarized as follows: The proposed SMGFL-Net shows the best performance compared to all of the benchmarks. Specially, SMGFL-Net achieves the best accuracies of 93.3%, 96.2%, and 97.2% with the training ratios of 0.1, 0.2, and 0.3, respectively. Although MGML-FENet achieves the best accuracy with a training ratio of 0.2 on NWPU-RESISC45, SMGFL-Net outperforms it on AID. For the global methods, ResNeXt50 achieves the best accuracy compared with all of the other global methods in this experiment. It is worth noting that, as the backbone network of SMGFL-Net, ResNet50 achieves 89.8%, 91.3%, and 93.7% with training ratios of 0.1, 0.2, and 0.3. Compared with ResNet50, SMGFL-Net achieves 93.3%, 96.2%, and 97.2% with training ratios of 0.1, 0.2, and 0.3. This result demonstrates the effectiveness of the proposed framework. For the local methods, Bilinear-CNN achieves relatively better accuracy compared with the performance on NWPU-RESISC45. For the FGVC methods, Progressive-MG and MGML-FENet also achieve good performance. This shows that multigranularity information extraction is very important in both remote sensing datasets. On the other hand, NTS-Net achieves better accuracy on AID than on NWPU-RESISC45.

Conclusions
In this paper, we propose a novel deep neural architecture named SMGFL-Net for RS image scene classification. At the training step, the proposed SMGFL-Net learns multigranularity features through a destruction operation on intermediate feature maps by a jigsaw puzzle generator with different sizes, which avoids the meaningless edges appearing in prior studies. The proposed method is validated on the NWPU-RESISC45 and AID datasets. Results show that SMGFL-Net can effectively learn and leverage both global and local features at different granularities, demonstrating the SOTA performance compared with several peer methods from recent literature.
This study has the following limitations, which will be addressed in future work. First, the efficacy of the proposed SMGFL-Net is only validated on two datasets, namely, NWPU-RESISC45 and AID, and its capability on further RS image datasets has not been evaluated. Second, in the experiment of this study, the number of stages with multigranularity branches and the patch sizes are manually determined, which is not cost-effective and provides limited guidance on the choice of these parameters. It is thus essential to develop a heuristic to search for an optimal or suboptimal set of parameters suitable for a problem instance in a systematic way. Third, the challenge of intraclass variance and interclass similarity could be addressed by other CNN design strategies, such as pairwise feature learning and biattention, which can be selectively integrated into SMGFL-Net.