Cross-Sensor Nighttime Lights Image Calibration for DMSP/OLS and SNPP/VIIRS with Residual U-Net

Remote sensing of nighttime lights (NTL) is widely used in socio-economic studies of economic growth, urbanization, stability of power grid, environmental light pollution, pandemics and military conflicts. Currently, NTL data are collected with two sensors: (1) Operational Line-scan System (OLS) onboard the satellites from the Defense Meteorology Satellite Program (DMSP) and (2) Visible Infrared Imaging Radiometer Suite (VIIRS) onboard the Suomi NPP (SNPP) and NOAA-20 satellites from the Joint Polar Satellite System (JPSS). However, the nighttime images acquired by these two sensors are incompatible in spatial resolution and dynamic range. To address this problem, we propose a method for the cross-sensor calibration with residual U-net convolutional neural network (CNN). The CNN produces DMSP-like NTL composites from the VIIRS annual NTL composites. The pixel radiances predicted from VIIRS are highly correlated with NTL observed with OLS (0.96 < R2 < 0.99). The method can be used to extend long-term series of annual NTL after the end of DMSP mission or to cross-calibrate same year NTL from different satellites to study diurnal variations.


Introduction
Defense Meteorology Satellite Program (DMSP) was launched in 1962 and since then its satellites with Operational Line-scan System (OLS) serve as a source of valuable nighttime light (NTL) data of the Earth surface. Starting from 1992 the DMSP satellites broadcast digital images, which were post-processed by the NOAA (currently Colorado School of Mines) Earth Observation Group (EOG) into global annual average and background removed NTL maps. With annual data stretch from 1992 to 2013, it makes DMSP Nighttime Lights the longest data series available for nocturnal remote sensing on human activities [1]. In 2011 Suomi NPP (SNPP) satellite with Visible Infrared Imaging Radiometer Suite (VIIRS) was launched. VIIRS instrument is also capable of detecting dim light sources at night. Annual VIIRS maps of nighttime lights are published from 2013 to 2019 [2]. NTL maps made with DMSP (DNL) or VIIRS (VNL) are widely utilized in research of human activity, economy and ecology [3].
Despite the value of the long-term archives from DMSP and VIIRS, their potential is not fully utilized. The main reason is that there is a significant difference between the OLS and VIIRS sensors (Table 1). VIIRS is significantly superior to OLS in terms of spatial, temporal, and radiometric resolution [4]. Due to the lack of calibration, DMSP data have no radiometric measurement units whilst VIIRS output is measured in nW/cm 2 /sr. Additionally, two satellite generations have different overpass times and thus they observe slightly different states of the city lights. The original DMSP annual NTL composites were produced using OLS data from six individual satellites F10, F12, F14, F15, F16 and F18. Over time, the orbits of each DMSP satellite gradually shift to an earlier overpass time, sliding from a day/night orbit to a dawn/dusk orbit. The OLS sensors collect sufficient nighttime data worldwide for annual NTL product generation as long as the overpass occurred later than 19:30. Recently, EOG has added additional years to the DMSP equatorial crossing time chart and discovered ( Figure 1) that the F15 satellite orbit had continued to shift and had started to collect pre-dawn nighttime data in 2012 and it continues through 2020. Satellite F16 has also collected usable nighttime data in the pre-dawn hours. Based on this new information, EOG has extended the time series of the annual DNL maps out to 2019 using pre-dawn data from F15 and F16.
Due to the lack of methods for cross-sensor image calibration from different generations of satellites, almost all NTL-related research in sociology, economy or ecology was limited to data from either DMSP or VIIRS [3]. Several attempts were made to crosscalibrate the NTL products from all the DMSP satellites [6] as well as both VIIRS and DMSP sensors [7][8][9]. They utilize different approaches like Pseudo-Invariant Feature paradigm (PIF) or Regression Models. The problem with most of these methods is that they assume that there is a linear or polynomial relationship between DMSP and VIIRS radiance, which is not always the case because the sensors have different spatial resolution, dynamic range and saturation effects.
Current literature was examined. Pandley et al. [6] found that for intercalibration of the DMSP data obtained from different satellites/in different years, the global-scale methods outperform the region-based calibration, none of the existing intercalibration methods applied to the data of two different satellites in one year gives the consistency of the values in all pixels either globally or on a regional scale, and the local brightness and proximity to the poles affects the quality of intercalibration. Li et al. [7] have proposed reference points to cross-calibrate the NTL maps from different years. The authors assume that all stable points on two cross-calibrated images are connected by robust linear regres-Remote Sens. 2021, 13, 5026 3 of 18 sion. Zheng et al. [8] proposed a method for constructing long-term consistent NTL time series for VIIRS and DMSP data and have built a set of cross-calibrated DMSP-like data from 1996 to 2017. They do not use the stable lights, but global radiance calibrated (GRC) DMSP-OLS products, because the wider dynamic range of the GRC data allowed them to avoid the saturation problem. However, GRC data is only available for several years. A logistic model was used to interpolate the GRC data into annual data from 1996 to 2013. In turn, VIIRS data is preprocessed using the BFAST time series decomposition algorithm to eliminate seasonal trends and noisy and unstable observations. For DMSP-like data, the authors used residuals corrected geographically weighted regression model (GWRc). Xing Tu et al. [9] proposed a new method for calibrating NTL sensors in relation to three Chinese megacities (Beijing, Shanghai, Guangzhou). As part of the DMSP data preprocessing, stepwise calibration, background noise removal and vegetation adjustment were used. Seasonal noise removal, yearly aggregation, background noise removal, vegetation adjustment and outliers correction are used to preprocess VIIRS data. To cross-calibrate the pixel values, they used a power-law regression found on stable lights in the years of overlap of VIIRS and DMSP observations. Chen et al. [10] have constructed an extended annual series (2000-2018) of VIIRS-like NTL data. Instead of lowering the quality of the input data, Chen et al. attempted to improve them by converting the DMSP data into the VIIRS projection and radiance range. To obtain additional information for such an improvement, they used vegetation index and auto-encoder neural network model. According to the authors, the result has a good regression consistency both at the pixel level (R 2 = 0.87) and at the city level (R 2 = 0.95). Li et al. [11] have generated a homogenized NTL dataset at the global scale for 1992-2018 by intercalibrating the DMSP data between the years/satellites, finding a sigmoid regression between the VIIRS and the intercalibrated DMSP nighttime lights and finally transforming the VIIRS data into DMSP-like with this regression. methods applied to the data of two different satellites in one year gives the consistency of the values in all pixels either globally or on a regional scale, and the local brightness and proximity to the poles affects the quality of intercalibration. Li et al. [7] have proposed reference points to cross-calibrate the NTL maps from different years. The authors assume that all stable points on two cross-calibrated images are connected by robust linear regression. Zheng et al. [8] proposed a method for constructing long-term consistent NTL time series for VIIRS and DMSP data and have built a set of cross-calibrated DMSP-like data from 1996 to 2017. They do not use the stable lights, but global radiance calibrated (GRC) DMSP-OLS products, because the wider dynamic range of the GRC data allowed them to avoid the saturation problem. However, GRC data is only available for several years. A logistic model was used to interpolate the GRC data into annual data from 1996 to 2013. In turn, VIIRS data is preprocessed using the BFAST time series decomposition algorithm to eliminate seasonal trends and noisy and unstable observations. For DMSPlike data, the authors used residuals corrected geographically weighted regression model (GWRc). Xing Tu et al. [9] proposed a new method for calibrating NTL sensors in relation to three Chinese megacities (Beijing, Shanghai, Guangzhou). As part of the DMSP data preprocessing, stepwise calibration, background noise removal and vegetation adjustment were used. Seasonal noise removal, yearly aggregation, background noise removal, vegetation adjustment and outliers correction are used to preprocess VIIRS data. To cross-calibrate the pixel values, they used a power-law regression found on stable lights in the years of overlap of VIIRS and DMSP observations. Chen et al. [10] have constructed an extended annual series (2000-2018) of VIIRS-like NTL data. Instead of lowering the quality of the input data, Chen et al. attempted to improve them by converting the DMSP data into the VIIRS projection and radiance range. To obtain additional information for such an improvement, they used vegetation index and autoencoder neural network model. According to the authors, the result has a good regression consistency both at the pixel level (R 2 = 0.87) and at the city level (R 2 = 0.95). Li et al. [11] have generated a homogenized NTL dataset at the global scale for 1992-2018 by intercalibrating the DMSP data between the years/satellites, finding a sigmoid regression between the VIIRS and the intercalibrated DMSP nighttime lights and finally transforming the VIIRS data into DMSP-like with this regression. This work aims to create a new method for synthesizing a map similar to the DMSP annual NTL map from the corresponding VIIRS product using the U-net Convolutional Neural Network (CNN) [12] and deep residual learning strategy [13]. This architecture achieves high accuracy of the VIIRS-based prediction of the DMSP NTL "brightness" with a relatively low volume of training data. It relies on the data augmentation to use the The question arises-why are we "downgrading" VIIRS products to DMSP, and not vice versa, as getting the result of the NTL maps of higher quality, which is VNL, seems more relevant? The main reason is that the saturation limit of the VIIRS sensor is much higher than that of the DMSP sensor. Therefore, for those brightness values that exceed the DMSP saturation threshold, there is nowhere to take the values. These synthetic DNL maps still can be used to extend the long-term NTL-related research in the time span 1991-2020, to cross-calibrate DMSP F15 and F16 product with the reference to the synthetic NTL maps from VIIRS and to study diurnal variations of the city lights observed 3-4 times at night with multiple satellites (F18, F15, F16, SNPP and NOAA-20).

Method
For the cross-calibration of the images with nighttime lights from the two satellite sensors, we utilize an artificial neural network with U-net architecture and residual blocks. U-net was first proposed for the task of biomedical image segmentation [12] but later its usage was expanded to other computer vision tasks. We train the network on the pair of annual NTL composite images, obtained for the same year with different satellite sensors. Then we can present to the network a VIIRS annual composite from another year, and it will generate a DMSP annual composite, which will be similar to one derived directly from the DMSP images.

Data Preparation
For network training we use the recently published annual stable lights composite from SNPP/VIIRS version 2.0 [2,14] and DMSP F15 NTL composite for the same year. The accompanying DMSP F15 NTL data for years 2013-2019 is available for end user from EOG web site. Research paper on the composition of the extended DMSP NTL series is submitted to the same Special Issue of Remote Sensing journal. For validation, we present to the network the SNPP/VIIRS version 2.0 composites for years 2013-2015 and 2017-2019, and then compare the network output with the "real" DMSP F15 NTL composite for the same year. For both DMSP and VIIRS data products, we have used the masked average annual radiance with background, biomass burning and aurora zeroed out. The geolocation algorithm for VIIRS Day-night band has far more precise control of its surface footprint, thus the DMSP F15 NTL series were resampled VIIRS DNB grid of 2016 using with the maximum cross-correlation and best-fit translation technique. The resampling of the VIIRS DNB grid from 15 arc-seconds to 30 arc-second was necessary so that both the DMSP and VIIRS grids were in the same resolution of 30 arc-seconds As was stated earlier, the data collected by different generations of satellites varies greatly in dynamic range. Data collected from VIIRS has a very large dynamic range extending over seven orders of magnitude [15], and the DMSP data has a significantly lower dynamic range of 6 bits. VIIRS data is the neural network input, and DMSP-like data is the output.
In addition, DMSP satellites have a peculiarity: sensors of this generation tend to saturate. Starting from certain values in the VIIRS data, the DMSP data corresponding to the same areas takes on a value of 63 and does not grow further. Moreover, this effect occurs at rather low values of the input data dynamic range. To simulate this effect and to help the neural network to learn better, the input VIIRS data was truncated with a threshold value of 2000 nW/cm 2 /sr. The threshold value was chosen to be equal to the upper limit of the possible NTL radiance values in cities. For example, one of the brightest cities in the world, Las Vegas, has the mean annual cloud-free radiance around 2000 nW/cm 2 /sr [16]. After truncation, the input and the output data are normalized to have values in a range from 0 to 1. After this trimming and scaling, the NTL brightness information for cities is preserved and the information for brighter places, such as gas flares, is limited, while the overall quality of the network output is improved.

U-Net Architecture
The main building block of our network is a residual block ( Figure 2). Its architecture is taken from He's original paper [13] with slight modification to better fit our goal. Each block consists of two 3 × 3 convolutions with a ReLU activation. Every convolution layer is followed by the dropout layer with the dropout rate of 0.2. This way we regularize the network and prevent it from overfitting. We use instance normalization [17] instead of commonly used batch normalization because it was shown that the instance normalization works better on image generation tasks. In order to match the dimension channel in the shortcut connection output and the output of convolutions, we use 1 × 1 convolution with a linear activation function on the shortcut connection output.
commonly used batch normalization because it was shown that the instance normalization works better on image generation tasks. In order to match the dimension channel in the shortcut connection output and the output of convolutions, we use 1 × 1 convolution with a linear activation function on the shortcut connection output.
The overall architecture of our U-net model is shown in Figure 3. It consists of 5 levels; on each level, there are 3 residual blocks. On the downsampling path, each level's output is saved and later concatenated to the input of its counterpart level on the upsampling path. The resolution of the feature tensor changes only when the level is changed. In order to perform the resolution change, max pooling and upsampling layers are used on downsampling and upsampling paths respectively. The final output of the network is activated by ReLU with a threshold of 1.
In order to find the optimal architecture of the neural network, we have tested different hyperparameters such as the number of residual blocks, the depth of the network and the number of feature channels after each block. Our goal was to maximize the quality of the resulting images using as few learnable parameters as possible. Our experiments have shown that decreasing the size of the network results in a significant drop in the quality of produced images and increasing the size of the network has a negative effect on the performance with marginal improvement in the quality.  The overall architecture of our U-net model is shown in Figure 3. It consists of 5 levels; on each level, there are 3 residual blocks. On the downsampling path, each level's output is saved and later concatenated to the input of its counterpart level on the upsampling path. The resolution of the feature tensor changes only when the level is changed. In order to perform the resolution change, max pooling and upsampling layers are used on downsampling and upsampling paths respectively. The final output of the network is activated by ReLU with a threshold of 1.
In order to find the optimal architecture of the neural network, we have tested different hyperparameters such as the number of residual blocks, the depth of the network and the number of feature channels after each block. Our goal was to maximize the quality of the resulting images using as few learnable parameters as possible. Our experiments have shown that decreasing the size of the network results in a significant drop in the quality of produced images and increasing the size of the network has a negative effect on the performance with marginal improvement in the quality. . U-net architecture is used to perform cross-calibration from VIIRS data to the analog of DMSP data. ResBlockN xM is M residual blocks stacked one after another, every convolutional layer in these blocks produces N feature channels; MaxPool is a max pooling layer that is used to reduce the resolution of features; UpSample is an upsampling layer, it is used to increase the resolution of features; Concat is a concatenation operation, it is performed on channels axis; the last 3 layers prior to output are instance normalization, 1 × 1 convolution and ReLU activation with a threshold 1.

Training Flow
The network is trained on multiple 256 × 256 patches taken from a single annual VIIRS NTL image. In order to get patches containing actual lights rather than a background, a quantile filter is used. This filter works as follows: it selects all pixels with intensity higher than a certain quantile of the radiance distribution within the VIIRS NTL image. Then patches containing only the selected pixels are produced. By having this filter, we can control the minimum radiance of the lights the network is trained on. For example, the quantile of 0.01 is used to select patches from low-radiance regions like the countryside, but when the quantile of 0.6 is used the patches will include major cities, gas flares, etc. So, we use the quantile filter to change the radiance of the data fed to the network on each epoch. This approach helps us to greatly reduce the oversaturation problem.
The training process consists of 20 epochs; training for more epochs increases test error. We gradually increase the quantile from 0.01 to 0.6 as an epoch number increases. Each epoch consists of 10 mini-epochs. On every mini-epoch, 1024 patches are generated by a quantile filter. Then data augmentation is performed to increase the number of patches in the training set. For augmentation we use rotations and flips.
We use Mean Absolute Error (MAE) as a loss function: where , is a "ground truth" value of the m-th pixel in the n-th image in the training batch (from the annual DMSP NTL data), , is a value of the m-th pixel in the n-th image in the batch predicted by the network from VIIRS data, N is the total number of images in the batch, M is the total number of pixels in one image (patch).
We use Adam stochastic optimizer [18] with the default settings except for the learning rate. We use a cyclical exponential decay learning rate (2) with the initial learning rate =0.001 and we configure it so that at the end of the epoch it will have a learning rate =0.0002. At every epoch start, it is reset to the initial learning rate. . U-net architecture is used to perform cross-calibration from VIIRS data to the analog of DMSP data. ResBlockN xM is M residual blocks stacked one after another, every convolutional layer in these blocks produces N feature channels; MaxPool is a max pooling layer that is used to reduce the resolution of features; UpSample is an upsampling layer, it is used to increase the resolution of features; Concat is a concatenation operation, it is performed on channels axis; the last 3 layers prior to output are instance normalization, 1 × 1 convolution and ReLU activation with a threshold 1.

Training Flow
The network is trained on multiple 256 × 256 patches taken from a single annual VIIRS NTL image. In order to get patches containing actual lights rather than a background, a quantile filter is used. This filter works as follows: it selects all pixels with intensity higher than a certain quantile of the radiance distribution within the VIIRS NTL image. Then patches containing only the selected pixels are produced. By having this filter, we can control the minimum radiance of the lights the network is trained on. For example, the quantile of 0.01 is used to select patches from low-radiance regions like the countryside, but when the quantile of 0.6 is used the patches will include major cities, gas flares, etc. So, we use the quantile filter to change the radiance of the data fed to the network on each epoch. This approach helps us to greatly reduce the oversaturation problem.
The training process consists of 20 epochs; training for more epochs increases test error. We gradually increase the quantile from 0.01 to 0.6 as an epoch number increases. Each epoch consists of 10 mini-epochs. On every mini-epoch, 1024 patches are generated by a quantile filter. Then data augmentation is performed to increase the number of patches in the training set. For augmentation we use rotations and flips.
We use Mean Absolute Error (MAE) as a loss function: where y true n,m is a "ground truth" value of the m-th pixel in the n-th image in the training batch (from the annual DMSP NTL data), y pred n,m is a value of the m-th pixel in the n-th image in the batch predicted by the network from VIIRS data, N is the total number of images in the batch, M is the total number of pixels in one image (patch).
We use Adam stochastic optimizer [18] with the default settings except for the learning rate. We use a cyclical exponential decay learning rate (2) with the initial learning rate = 0.001 and we configure it so that at the end of the epoch it will have a learning rate = 0.0002. At every epoch start, it is reset to the initial learning rate.
where lr is a current learning rate, lr init is an initial learning rate, d is a decay rate. The whole training process takes about 10 h to complete on the NVIDIA Tesla V-100 graphics card. The training was conducted with Google Colab notebooks using virtual machine instances with GPU in Google cloud. Tensorflow machine learning platform was used to build and train the CNN.

Generation of Results
Since the whole composite of VIIRS data takes about 3 Gb, it is impossible to use the trained network directly on the data to produce DMSP-like imagery. Instead, we divide VIIRS data into overlapping patches of the size 256 × 256 with each pixel being covered by at least 4 patches and predict on these patches. It is necessary to use overlapping patches because of the visual errors, which may occur on the edges of non-overlapping patches. Then these overlapping patches are multiplied by a 256 × 256 filter with normally distributed coefficients and added together. The coefficients are added to the array of the same size as the full DMSP-like composite. Finally, the image obtained by adding overlapping patches is divided by the coefficients array to normalize the imagery to the original distribution of radiance values. This technique allows for the generation of DMSPlike imagery on a GPU with limited memory. It also tackles the problem of visual artefacts at the edges of the non-overlapping patches, yielding the result that has no visual artifacts presented by stitching.

Result
To test our model, we took certain points of interest, namely Las Vegas, Los Angeles, Moscow, and a gas flare in Venezuela (63.5635E, 9.6419N), which is the brightest point in the 2015 VIIRS data. This data was not seen by a network so it suits well for a testing purpose. For these spots, scatter plots were made. They can be seen in Figure 4. The plots on the Y-axis contain the pixel values for the original DMSP data, and on the X-axis the corresponding pixel values obtained using the neural network are shown. Linear regression and its 95% confidence interval have also been plotted to show the ongoing trends. Fraction shows the likelihood of the predicted pixel to have a certain value compared to the original DMSP value for that pixel. where lr is a current learning rate, lrinit is an initial learning rate, d is a decay rate.
The whole training process takes about 10 h to complete on the NVIDIA Tesla V-100 graphics card. The training was conducted with Google Colab notebooks using virtual machine instances with GPU in Google cloud. Tensorflow machine learning platform was used to build and train the CNN.

Generation of Results
Since the whole composite of VIIRS data takes about 3 Gb, it is impossible to use the trained network directly on the data to produce DMSP-like imagery. Instead, we divide VIIRS data into overlapping patches of the size 256 × 256 with each pixel being covered by at least 4 patches and predict on these patches. It is necessary to use overlapping patches because of the visual errors, which may occur on the edges of non-overlapping patches. Then these overlapping patches are multiplied by a 256 × 256 filter with normally distributed coefficients and added together. The coefficients are added to the array of the same size as the full DMSP-like composite. Finally, the image obtained by adding overlapping patches is divided by the coefficients array to normalize the imagery to the original distribution of radiance values. This technique allows for the generation of DMSP-like imagery on a GPU with limited memory. It also tackles the problem of visual artefacts at the edges of the non-overlapping patches, yielding the result that has no visual artifacts presented by stitching.

Result
To test our model, we took certain points of interest, namely Las Vegas, Los Angeles, Moscow, and a gas flare in Venezuela (63.5635E, 9.6419N), which is the brightest point in the 2015 VIIRS data. This data was not seen by a network so it suits well for a testing purpose. For these spots, scatter plots were made. They can be seen in Figure 4. The plots on the Y-axis contain the pixel values for the original DMSP data, and on the X-axis the corresponding pixel values obtained using the neural network are shown. Linear regression and its 95% confidence interval have also been plotted to show the ongoing trends. Fraction shows the likelihood of the predicted pixel to have a certain value compared to the original DMSP value for that pixel.   To evaluate the quality of image translation more thoroughly, we have calculated the determination coefficient (R2) shown in Table 2. These coefficients are very close to 1 for all the spots, showing that the model predicts results that are close to the original DMSP data. Additionally, the same 2D histograms were plotted with a median, 2.5 percentile, and 97.5 percentile instead of linear regression slope and confidence intervals. They can be seen in Figure 5. We have made the same plots with linear regression and the median for the whole world. These can be accessed in Supplementary Materials. The overall R2 for the world DMSP-like and F15 radiances in 2015 is 0.94.
One of the most significant achievements of our model is that it yields images that visually appear very close to the DMSP data. This has never been achieved before in the previous papers on this topic. A comparison between the DMSP data and data that is generated by our model is presented in Figure 6. At first glance, it is almost impossible to distinguish between them and the difference lies in subtle things like a glow in the Los Angeles Bay area, which is slightly fainter in data generated by our model compared to the DMSP data. The model captured all main features of DMSP data like saturation in images and blur effect, generating these features from clear, high resolution, higher dynamic range images gathered from the VIIRS sensor. To evaluate the quality of image translation more thoroughly, we have calculated the determination coefficient (R2) shown in Table 2. These coefficients are very close to 1 for all the spots, showing that the model predicts results that are close to the original DMSP data. Additionally, the same 2D histograms were plotted with a median, 2.5 percentile, and 97.5 percentile instead of linear regression slope and confidence intervals. They can be seen in Figure 5. We have made the same plots with linear regression and the median for the whole world. These can be accessed in Supplementary Materials. The overall R2 for the world DMSP-like and F15 radiances in 2015 is 0.94. One of the most significant achievements of our model is that it yields images that visually appear very close to the DMSP data. This has never been achieved before in the previous papers on this topic. A comparison between the DMSP data and data that is generated by our model is presented in Figure 6. At first glance, it is almost impossible to distinguish between them and the difference lies in subtle things like a glow in the Los Angeles Bay area, which is slightly fainter in data generated by our model compared to the DMSP data. The model captured all main features of DMSP data like saturation in images and blur effect, generating these features from clear, high resolution, higher dynamic range images gathered from the VIIRS sensor. Remote Sens. 2021, 13, x FOR PEER REVIEW9 of 20   To assess the quality of our model from one more perspective, we have plotted histograms for our test spots. In these histograms, we examine how values of original DMSP data and predicted from VIIRS data are distributed. Histograms are shown in Figures 7 and 8. The difference between these figures is that in Figure 8 data is plotted for mid-high range pixel values. One can see that distribution of values is quite close between original and predicted data but there are some differences. In Figure 7 the most obvious difference occurs near values close to 0. Neural Network predicts values 1-2 when the background removed annual DMSP images do not have these values. It happens because when DMSP data is processed the low-light values 1-2 are set to background 0 and the NN did not catch this peculiarity.  range pixel values. One can see that distribution of values is quite close between original and predicted data but there are some differences. In Figure 7 the most obvious difference occurs near values close to 0. Neural Network predicts values 1-2 when the background removed annual DMSP images do not have these values. It happens because when DMSP data is processed the low-light values 1-2 are set to background 0 and the NN did not catch this peculiarity.  Original DMSP data is plotted in blue, data that was predicted by our model is plotted in orange. Probability shows the number of pixels with that value divided by the number of all the pixels. Pixels with value 0 are truncated.

Discussion
The main strengths of our approach are visual results that have never been achieved before and the good metrics performance of the model. The quality of produced images is sufficient for them to be visually indistinguishable from the real DMSP data. The determination coefficient of the model is very close to 1 in all the test spots; distribution of the generated data in the midrange pixel values is similar to the original DMSP. However, the

Discussion
The main strengths of our approach are visual results that have never been achieved before and the good metrics performance of the model. The quality of produced images is sufficient for them to be visually indistinguishable from the real DMSP data. The determination coefficient of the model is very close to 1 in all the test spots; distribution of the generated data in the midrange pixel values is similar to the original DMSP. However, the questions may arise regarding our strategy to downgrade high quality data from the modern satellites instead of upgrading the old one and to use the local context around the predicted pixel with a non-linear NN-based image filter instead of the much simpler pixel-to-pixel regression.

Why Are We Converting VIIRS to DMSP Radiances and Not Vice Versa?
We are converting VIIRS radiances to DMSP DNs for two reasons: (1) the saturation limit of the VIIRS sensor is higher than that of the and (2) spatial resolution of the VIIRS Remote Sens. 2021, 13, 5026 13 of 18 sensor is five times better than that of the DMSP. For those brightness values that exceed the saturation threshold DMSP, there is nowhere to take values. It is possible to filter out (smooth) high spatial frequencies from the VIIRS image, but there is no deterministic way to add detail to the DMSP one. This can be shown by the comparison with the results presented in [10]. In this work, an extended time series (2000-2018) of nighttime data similar to NPP-VIIRS was constructed using cross-sensor calibration using annual DMSP-OLS data for 2000-2012. and monthly SNPP-VIIRS data for 2013-2018. A key feature of this approach was the improvement of the image by using the vegetation index and an artificial neural network with an autoencoder model. The result of this work is a set of annual NTL files in open access.
We have compared the results of Chen et al. [10] for 2015 with the annual VNL v2.0 data for 2015 published by EOG [2]. We also have made a similar comparison for the result of our 2015 cross-satellite calibration, comparing it with the annual DMSP data for 2015. Figure 9 shows the percentage histograms of the pixel brightness in Sicily. We chose Sicily for cross-comparison, since in this region the brightness of the night lights changes little throughout the year [19]. The percentile histogram does not show the absolute number of values that fall into the bin but the percentage of their total number. We switched to percentiles because the data [2,10] at 15 arcsec. Latitude-longitude grid are in slightly different projections: 0.0045 deg. step vs. 0.0042 deg. We see (right histogram) that the higher the brightness, the greater the discrepancy in the number of pixels between the Chen's data [10] and the VIIRS NTL. At the same time (left histogram), no such deviation is observed for the DMSP radiances predicted with NN from VIIRS NTL. ern satellites instead of upgrading the old one and to use the local context around the predicted pixel with a non-linear NN-based image filter instead of the much simpler pixelto-pixel regression.

Why Are We Converting VIIRS to DMSP Radiances and Not Vice Versa?
We are converting VIIRS radiances to DMSP DNs for two reasons: (1) the saturation limit of the VIIRS sensor is higher than that of the and (2) spatial resolution of the VIIRS sensor is five times better than that of the DMSP. For those brightness values that exceed the saturation threshold DMSP, there is nowhere to take values. It is possible to filter out (smooth) high spatial frequencies from the VIIRS image, but there is no deterministic way to add detail to the DMSP one. This can be shown by the comparison with the results presented in [10]. In this work, an extended time series (2000-2018) of nighttime data similar to NPP-VIIRS was constructed using cross-sensor calibration using annual DMSP-OLS data for 2000-2012. and monthly SNPP-VIIRS data for 2013-2018. A key feature of this approach was the improvement of the image by using the vegetation index and an artificial neural network with an autoencoder model. The result of this work is a set of annual NTL files in open access.
We have compared the results of Chen et al. [10] for 2015 with the annual VNL v2.0 data for 2015 published by EOG [2]. We also have made a similar comparison for the result of our 2015 cross-satellite calibration, comparing it with the annual DMSP data for 2015. Figure 9 shows the percentage histograms of the pixel brightness in Sicily. We chose Sicily for cross-comparison, since in this region the brightness of the night lights changes little throughout the year [19]. The percentile histogram does not show the absolute number of values that fall into the bin but the percentage of their total number. We switched to percentiles because the data [2,10] at 15 arcsec. Latitude-longitude grid are in slightly different projections: 0.0045 deg. step vs. 0.0042 deg. We see (right histogram) that the higher the brightness, the greater the discrepancy in the number of pixels between the Chen's data [10] and the VIIRS NTL. At the same time (left histogram), no such deviation is observed for the DMSP radiances predicted with NN from VIIRS NTL.  When analyzing temporal dynamics of city lights, errors in the highest brightness values can lead to significant inaccuracies. We performed the following analysis: (1) randomly selected 13 large cities with bright lighting (Paris, Madrid, Lisbon, Prague, Minsk, Moscow, Shanghai, Tokyo, Las Vegas, Mexico City, Rome, New Delhi, Wuhan); (2) took squares of size 0.2 × 0.2 degrees in the city downtown; (3) for each of the four datasets, the values of the brightness of the pixels were averaged in the selected squares; (4) calculated the ratio of the brightness of the predicted data to the original; (5) made the scatterplots of the predicted vs. observed ratios ( Figure 10). The scatterplots show that the average brightness for city centers according to [10] can be two times lower compared to the VIIRS NTL v2.0. At the same time, the ratio of the NN predicted DMSP brightness to the observed DMSP NTL for city centers is close to 1.
the ratio of the brightness of the predicted data to the original; (5) made the scatterplots of the predicted vs. observed ratios ( Figure 10). The scatterplots show that the average brightness for city centers according to [10] can be two times lower compared to the VIIRS NTL v2.0. At the same time, the ratio of the NN predicted DMSP brightness to the observed DMSP NTL for city centers is close to 1. We have also noticed that the background (no lights) in [10] is filtered with a higher threshold compared to VIIRS NTL v2.0. For example, in Sicily ( Figure 11) the NTL map built from [10] has a significantly larger area of the background (highlighted in green) compared to VIIRS NTL. Contrary to that, the background area for the NN prediction of DMSP-like radiances is 4% smaller than the background of the annual DMSP NTL in 2015. We have also noticed that the background (no lights) in [10] is filtered with a higher threshold compared to VIIRS NTL v2.0. For example, in Sicily ( Figure 11) the NTL map built from [10] has a significantly larger area of the background (highlighted in green) compared to VIIRS NTL. Contrary to that, the background area for the NN prediction of DMSP-like radiances is 4% smaller than the background of the annual DMSP NTL in 2015.

Can We Use Pixel-to-Pixel Regression for Converting VIIRS Radiance to DMSP?
We have compared our result with the results obtained in [11]. As part of this work, the authors intercalibrated the DMSP data between years (for the years when the DMSP observations were made) and simulated the annual DMSP-like radiances from VIIRS (for those years in which there were no DMSP observations) with sigmoid regression. The result of this work is also a set of annual NTL files in open access. We took the simulation result for 2015 from [11] and compared it with the annual DMSP F15 NTL map for 2015 (link to open annual DMSP data for 2015 or the form where this data can be ordered). For comparison, we plotted histogram of pixel brightness in Sicily resulting from both crosscalibration methods in Figure 12. It shows that the consistency of histograms is higher for our approach.

Can We Use Pixel-to-Pixel Regression for Converting VIIRS Radiance to DMSP?
We have compared our result with the results obtained in [11]. As part of this work, the authors intercalibrated the DMSP data between years (for the years when the DMSP observations were made) and simulated the annual DMSP-like radiances from VIIRS (for those years in which there were no DMSP observations) with sigmoid regression. The result of this work is also a set of annual NTL files in open access. We took the simulation result for 2015 from [11] and compared it with the annual DMSP F15 NTL map for 2015 (link to open annual DMSP data for 2015 or the form where this data can be ordered). For comparison, we plotted histogram of pixel brightness in Sicily resulting from both cross-calibration methods in Figure 12. It shows that the consistency of histograms is higher for our approach.

Can We Use Pixel-to-Pixel Regression for Converting VIIRS Radiance to DMSP?
We have compared our result with the results obtained in [11]. As part of this work, the authors intercalibrated the DMSP data between years (for the years when the DMSP observations were made) and simulated the annual DMSP-like radiances from VIIRS (for those years in which there were no DMSP observations) with sigmoid regression. The result of this work is also a set of annual NTL files in open access. We took the simulation result for 2015 from [11] and compared it with the annual DMSP F15 NTL map for 2015 (link to open annual DMSP data for 2015 or the form where this data can be ordered). For comparison, we plotted histogram of pixel brightness in Sicily resulting from both crosscalibration methods in Figure 12. It shows that the consistency of histograms is higher for our approach. For the NN predictions, we have plotted sums of lights and pixel brightness histograms in Sicily (Figures 13 and 14) for the years from 2013 to 2019 where the observed SNPP-VIIRS and DMSP F15 datasets overlap. Both the sum of lights in Figure 13 and their statistical distributions in Figure 14 match well for all the years. The difference in sum of lights for 2013-2019 does not exceed 15%, with the biggest difference happening in 2013.
For the NN predictions, we have plotted sums of lights and pixel brightness histograms in Sicily (Figures 13 and 14) for the years from 2013 to 2019 where the observed SNPP-VIIRS and DMSP F15 datasets overlap. Both the sum of lights in Figure 13 and their statistical distributions in Figure 14 match well for all the years. The difference in sum of lights for 2013-2019 does not exceed 15%, with the biggest difference happening in 2013.

Conclusions
This work provides a method based on Convolutional Neural Networks to generate analogs of DMSP data from VIIRS data. Our solution proves to work in various geographical areas. Usage of CNN for this purpose provides great accuracy (0.96 < R 2 < 0.99) and produces images that are visually indistinguishable from original DMSP images. Our

Conclusions
This work provides a method based on Convolutional Neural Networks to generate analogs of DMSP data from VIIRS data. Our solution proves to work in various geographical areas. Usage of CNN for this purpose provides great accuracy (0.96 < R 2 < 0.99) and produces images that are visually indistinguishable from original DMSP images. Our method preserves the distribution of midrange values and proves to work well in realworld scenarios. All this makes it possible to complete long-term NTL series and to utilize both DMSP and VIIRS data in potential studies. In order to make usage of the CNN as easy as possible, in the Supplementary Materials to this paper we will publish the weights of the trained neural network and the source code to generate DMSP-like NTL maps from VIIRS with this network. Observed DMSP and VIIRS nighttime lights for 2012-2019 as well as the NN predicted DMSP-like data are in open access from the EOG web site at Colorado School of Mines.  Data Availability Statement: Publicly available datasets were analyzed in this study. All the NTL data used for training and validation of the neural network are available at https://eogdata.mines. edu/products/vnl/ (accessed on 8 December 8 2021) and https://eogdata.mines.edu/products/ dmsp/ (accessed on 8 December 2021).

Conflicts of Interest:
The authors declare no conflict of interest.