1. Introduction
The Advanced Space-based Solar Observatory (ASO-S) was launched on October 2022, with the scientific objective of studying the relationships between the solar magnetic field, solar flares, and coronal mass ejections [
1]. The Full-disk Magnetograph (FMG) is one of the three payloads on the ASOS, with the work spectrum line as Fe I 5324.19 Å line [
2]. As a filter-type spectrometer, the FMG performs polarization observations on one side of the line center in its regular mode, at the position of about −0.08 Å. The regular observation of FMG has a temporal resolution of approximately 2 min, a spatial resolution of about 1.5 arcseconds, and a pixel size of approximately 0.5 arcseconds [
2]
The single-wavelength point polarization data of FMG do not allow magnetic field inversion and can only obtain magnetic field data through linear calibration. Under the weak-field approximation, the line-of-sight (LOS) magnetic field can be derived from the circular polarization parameter:
where
is the calibration coefficient,
V and
I are the components of Stokes, and
is the LOS magnetic field. The linear calibration coefficient used for FMG on-orbit data is
[
3]. However, linear calibration encounters the issue of magnetic saturation, leading to incorrect calibration results for strong magnetic fields. Xu et al. [
4] compared the LOS magnetic fields of FMG and the Helioseismic and Magnetic Imager (HMI), highlighting the magnetic saturation effect [
5,
6].
Additionally, as a spaceborne instrument, FMG faces challenges in magnetic field calibration due to wavelength shifts caused by the Doppler effect. The relative motion between the detector and the Sun results in a shift in the observed wavelength position. The calibration coefficients vary at different wavelength positions, meaning that the calibration relationship changes over time as the LOS component of the satellite’s orbital velocity changes. This effect is further complicated by the influence of the LOS component of the Sun’s rotational velocity. As a result, different calibration relationships may exist across various positions on the solar disk at the same observation time.
In traditional calibration methods, the issue of wavelength shift is typically addressed using a tabulation approach. Calibration coefficients are tabulated for different relative velocities, and the calibration relationship for various regions on the solar disk is determined by referencing these tables based on the LOS component of the orbital velocity at the time of observation. This method requires a sufficient amount of observational data, covering the entire range of orbital velocities, to accurately calculate the calibration coefficients.
To address the issues of magnetic saturation and wavelength shift in single-wavelength calibration, we use the Convolutional Neural Network (CNN) method in machine learning (ML) to construct a calibration model. The model is trained to learn the mapping relationship between single-wavelength polarization data and LOS magnetic field obtained from multi wavelength polarization inversion. We hope that the trained model will output the right LOS magnetic field without the influence of magnetic saturation and wavelength drift based on single-wavelength observations.
Given the powerful nonlinear fitting capabilities of machine learning, many researchers have attempted to apply those machine learning methods to magnetic field inversion and calibration since the beginning of the 20th century. Carroll and Saude [
7] were among the first to utilize a multi-layer perceptron (MLP) for fitting the Stokes inversion of 81 wavelength points, allowing them to derive various parameters, including the total magnetic field [
8,
9]. Socas-Navarro [
10] introduced an inversion method based on principal component analysis (PCA), which they also employed for preprocessing the neural network input data [
11,
12]. This approach effectively reduced data dimensionality, enhancing the speed of network predictions (see [
10]). Carroll et al. [
13] applied MLP to model the radiative transfer involved in Zeeman–Doppler imaging and Stokes profile inversion. In another study, Teng [
14] leveraged statistical machine learning techniques based on the Mercer kernel to deduce the photospheric magnetic field from polarization data. For the first time, Asensio Ramos and Díaz Baso [
15] utilized CNN in the calibration of solar magnetic fields. Guo et al. [
16,
17] employed both MLP and CNN for magnetic field inversion using Hinode/SP data, showcasing the practicality of neural networks in single-wavelength magnetic field calibration [
18]. Higgins et al. [
19] applied a UNet architecture for inverting HMI data and implemented regression-by-classification in their output process. Mistryukova et al. [
20] designed an end-to-end inversion code based on neural networks and the Milne–Eddington (ME) model, providing both the stellar atmosphere parameter estimation and their uncertainty intervals. Before the launch of FMG, we simulate the on-orbit single-wavelength point polarization observations using HMI data and develop a calibration method based on neural networks [
21].
In
Section 2, we introduce the data sources used to construct the dataset and the data preprocessing methods. In
Section 3, we introduce the CNN model we used and the training related methods, and present the output of the model. In
Section 4, we examine in detail the correction effect of magnetic saturation and wavelength shift in the output of the CNN model.
2. Data and Preprocess
2.1. Data
We use level 1 FMG data as input for the model, which includes Stokes parameters for the four channels of . Compared to level 0 data, these level 1 data are corrected for dark field effects and cross-talk effects. Due to the correlation between LOS magnetic field parameters and Stokes , we select I and V channels as inputs for the model.
In order for the model to calculate the magnetic field parameters corresponding to the polarization parameters, we need to provide the corresponding magnetic field parameters as targets during the training process. And we hope that there is no magnetic saturation effect or spectral line drift in these target LOS magnetic fields. We consider using data from HMI as the output target. HMI performs 6-point polarization observations at the 6173.34 Å spectral line, outputting LOS magnetic fields with a 45 s cadence and 6-point polarization and vector magnetic field data with a 720 s cadence [
5,
22]. The HMI vector magnetic field is derived by solving the Milne–Eddington (ME) model using the VFISV code [
23], which avoids the issues of magnetic saturation and wavelength shift that occur with single-point observations.
We need to use a monochromatic image to register the HMI and FMG images. For this purpose, we select 720 s data from HMI to generate targets in the dataset. The 720 s data of HMI include polarization parameters for six complete wavelength points, as well as multiple magnetic field and atmospheric parameters. These parameters include the total magnetic field, inclination angle, and azimuth angle. We calculate the LOS magnetic field from these. The processing of HMI data and cross instrument alignment methods are detailed in the preprocessing section.
Using HMI as the target for neural network learning does not mean converting the FMG polarization image into the HMI longitudinal magnetic map; the two instruments operate on different spectral lines, and this calibration method is expected to retain different information in the 5324 Å polarization map from that in the 6173 Å magnetic map; in addition, a 2 min cadence (FMG observation cadence) longitudinal magnetic map with the same calibration relationship as the HMI 720 S data can be obtained.
Due to the large volume of data, we select one day every five days from May to August to create the dataset. Data from 24 days are selected and split into training, validation, and test sets in a 6:2:2 ratio, resulting in 1246, 462, and 423 groups of FMG-HMI images, respectively.
Table 1 lists the data sources used to construct our dataset. After preprocessing and cropping, the final numbers of 256 × 256 pixel sized image data groups in the three datasets are 4099, 1424, and 1419, respectively.
2.2. Preprocess and Dataset
The preprocessing required for FMG1.5 level data before entering the network only includes normalization, image cropping, and co-alignment. The I channel divides each image by its own median to scale to the order of 1. The V channel is actually V/I data, mostly on the order of to . We scale the V channel to the order of 1 by multiplying it with a constant coefficient.
Additionally, to account for the Doppler effect, we need to input the velocity information into the network. The relative velocity in this observation is caused by the LOS component of the Sun’s surface rotational velocity and the satellite’s orbital velocity.
The LOS component of the satellite’s orbital velocity is obtained from the FITS header of the data, where the keyword ‘OBSVR’ represents this value. For a given moment’s data, we expand this value into an image with the same dimensions as the observational data, where each pixel holds the same value.
The Sun’s differential rotational velocity is calculated using the theoretical formula. By applying the map function from Sunpy to the data, the longitude and latitude of each pixel on the solar disc can be determined [
24]. The latitude and longitude images are projection-corrected and adjusted for the B-angle. B-angle refers to the angle between the solar axis and the camera plane caused by the angle between the solar equatorial plane and the ecliptic plane. It is also equivalent to the latitude of the heliocentric position on the image. These latitude and longitude images are then used in the rotation formula to produce a rotational velocity image, with the velocity direction aligned along the solar disc’s latitudinal lines. The parameters of solar differential rotation come from Newton and Nunn [
25], Timothy et al. [
26]. This velocity is projected onto the line of sight of the detector to obtain the LOS component of the rotational velocity, which we refer to as the ‘ROTVR’ image, consistent with the ‘OBSVR’ image.
We use ROTVR and OBSVR as two additional channels for the model input. Similarly, these two channels are normalized during input by dividing them by a scaling factor, which we set to 3000 m/s.
Most of the pixels on the solar surface are quiet regions with weak magnetic fields, so directly inputting the solar surface image during training would waste a lot of training time. To generate the dataset, we crop the images into smaller sizes around the active area. Due to the need for a target image that corresponds to the local input image, we perform a cropping operation in the step of aligning FMG with HMI.
One of the challenging steps in constructing the dataset is aligning the FMG and HMI images. To achieve a sufficiently precise alignment, we design an iterative alignment method that undergoes multiple iterations. This method is similar to the one used in processing SUTRI data [
27].
We first use the SIFT algorithm to align the full-disk HMI images with the FMG images [
28]. For the majority of the data, this approach results in alignment errors within dozens of pixels. However, due to the nature of the data, some images cannot be matched effectively by SIFT, either because they lack enough feature points or because incorrect feature point matches are made. This often leads to an exaggerated transformation matrix. We apply a set of thresholds of transformation matrices to discard these datasets that cannot be initially aligned.
To obtain precisely aligned FMG-HMI data pairs, we perform a second, more accurate alignment on the cropped images. We use cross-correlation to achieve precise alignment of active region images. The alignment process provides the relative displacement between images and the cross-correlation coefficient after alignment.
Some images are difficult to align due to factors such as differing PSF functions and distortions from different instruments. We apply a cross-correlation coefficient threshold to discard those active region images that cannot be precisely aligned.
4. Discussion
In order to evaluate the performance of CNN methods in single-wavelength calibration tasks, we conduct the following series of analyses in this section. These analyses include examining the LOS magnetic field from linear calibration, calibration using CNN, and the original HMI data, and the correlations between them; examining the relationship between CNN output and target at different speeds to evaluate the stability of CNN under different wavelength shifts; and tracking the I images, the linear calibration, and CNN output of an active region at different speeds in orbit, influenced by OBSVR. In addition, we discuss the mid-value of the quiet region and magnetic flux of the active region with different OBSVR.
Figure 2 demonstrates the correction of the magnetic saturation effect using the CNN calibration method. We find two examples from the test set where FMG experienced magnetic saturation during observation and compare the LOS magnetic fields obtained using linear calibration and the CNN method with the HMI LOS magnetic field. Additionally, scatter plots are created for each pair of these three magnetic fields to assess their correlation. It can be seen that in regions with stronger magnetic fields, such as the sunspot centers, the linear calibration shows clear signs of magnetic saturation. In contrast, the LOS magnetic field obtained using the CNN method no longer exhibits saturation and shows a higher correlation with the HMI magnetic field derived from inversion.
To assess the calibration accuracy of our model under different spectral line shifts, we plot scatter plots of the model’s output at various relative motion velocities. As shown in
Figure 3, each panel represents a scatter plot of the model’s output versus the target at different velocities, with color indicating the density of the points. Similarly, the closer the points are to the red line, the better the prediction matches the target. The velocity labeled in each panel is the sum of OBSVR and ROTVR for a pixel in the data. It can be observed that the distribution of predictions remains relatively consistent across different velocities. This indicates that the prediction by the CNN model is almost unaffected by the wavelength shift.
In
Figure 4, we present the output results of the network model during a single orbit. The data are from 13 June 2023, and we track and plot the active region NOAA AR13331. The three timestamps shown in the figure correspond to the FMG OBSVR values of 4109
, −6
, and −3707
. These represent the maximum positive and negative orbital velocities, as well as a moment when the velocity is near zero. Due to a period of time when the satellite is in Earth’s shadow, the chronological order of the images is a, c, and b.
It can be observed that the monochromatic images captured by FMG show significant brightness changes at these three timestamps. In the V/I images, some small magnetic field structures exhibit different areas and intensities at different velocities. However, the CNN output at each timestamp displays a more stable background, magnetic field strength, and structure. The variations in I and V/I images at different times are due to the impact of spectral line shifts. When the actual observed wavelength is closer to the line center, the I image appears darker, and when it is farther, the image appears brighter. In V and V/I images, the relationship between the wavelength position and pixel Digital Number (DN) value is more complex. The figure demonstrates that the CNN model, compared to linear calibration, is better suited for handling the calibration relationship under different orbital velocities.
We track the changes in magnetic flux of the AR13331 in the data of two calibration methods on 13 June 2023 with different OBSVR. We separately calculate the mean magnetic flux of LOS magnetic greater than 300 G and the mean magnetic flux of magnetic fields less than negative 300 G. The mask for selecting pixels is pixels in the CNN output with an absolute value greater than 300 G. The positive and negative mean magnetic fluxes of two LOS magnetic fields and their corresponding orbital velocities are shown in
Figure 5. The selection of pixels depending on the absolute values of CNN output allows the mean magnetic flux to be less than the threshold of 300 G. In addition, due to the lack of observation for a portion of the time in the orbit, there are jumps in the curve. We can see that the absolute value of the CNN flux is greater than that of the linearly scaled flux. This is because of the magnetic saturation effect, which results in a smaller linear calibration magnetic field at the center of the sunspot. The linear calibration magnetic flux exhibits periodic oscillations with the OBSVR, and the fluctuation amplitude is much larger than that of the CNN magnetic flux.
In
Figure 6, we present the variation of median values in the quiet regions at different times of the day. We statistically analyze the data from 13 May and 7 August 2023 which are displayed as panels a and b, respectively. In the first graph of each panel, we show the median of the LOS magnetic field for pixels below 300 G at different times of the day, and in the second graph, we present the corresponding OBSVR of the data. It can be observed from the graphs that both the linear calibration and the CNN-derived median values in the quiet regions exhibit periodic changes with OBSVR. From the statistical results, we believe that there is a bias in the magnetic field observed by the instrument. The linear calibration method reveals the magnitude of this bias and its variation with OBSVR. However, the bias in the LOS field obtained by the CNN model is smaller and more stable with respect to the OBSVR.
In addition, the bias of the quiet region of CNN on May 3rd shows much larger fluctuations than on 7 August. We believe this is due to a significantly large OBSVR. The OBSVR range on 3 May is −2986.82 m/s to 3769.77 m/s, while the range on 7 August is −3399.43 m/s to 2907.13 m/s. Positive velocity indicates that the instrument and the Sun are moving away from each other. Due to the conventional observation wavelength position being −0.08 Å, excessive positive velocity may cause the actual observation position to move beyond the spectral line width. In this case, the V/I signal becomes small enough to lower the signal-to-noise ratio. This causes the quiet region bias calculated by CNN to exhibit significant fluctuations at sufficiently high speeds.
5. Conclusions
To deal with the magnetic saturation and wavelength shift in the calibration of FMG LOS magnetic fields, we designed and provided a CNN-based single-wavelength nonlinear calibration method. Our model takes as input the Stokes I and V images from FMG level 1 data, along with the LOS components of the satellite orbital velocity and the solar rotational velocity. The labels used for training the model is the LOS magnetic field images from HMI 720 s data.
In the test set, the model’s prediction of the LOS magnetic field compared to the HMI LOS magnetic field used as labels has an MAE of 19.87 G, an RMSE of 38.61 G, and a coefficient of determination of 0.969. Our model effectively corrects the magnetic saturation effect in the FMG LOS magnetic field obtained from linear calibration, demonstrating good agreement with the HMI LOS magnetic field derived from VFISV. For data across different satellite speeds during an orbit, the model remains unaffected by brightness variations in the original data. By analyzing the variation of the mean magnetic flux in the active area with orbital velocity, we believe that the LOS magnetic field obtained by the CNN method is more stable than the linear calibration method when the OBSVR changes. We also tracked the median value of the quiet region, which serves as the bias of the magnetic field image, to observe how it changes with orbital speed throughout the day. The results indicate that our model exhibits smaller bias and lower fluctuations compared to linear calibration.
Through these analyses, we believe that in the single-wavelength LOS magnetic field calibration task, CNN solves the problem of magnetic saturation better than linear calibration and can adapt to different wavelength drifts. It is interesting that cross-instrument learning can correct biases in the data to a certain extent.
However, it should be noted that there are still some issues with the CNN method. Due to the limitations of single-wavelength data, some fluctuations may still be observed at extremely high speeds as can be seen from the bias in the quiet region. And this fluctuation can only be obtained at one of the maximum positive or negative velocities, depending on which side of the center line the observation position is on. In addition, the rotational speed of the solar surface is calculated based on theoretical formulas and is therefore influenced by the selection of differential rotation parameters. In subsequent work, we will use different differential rotation parameters to obtain the theoretical solar differential rotation velocity and compare the impact on the magnetic field calibration [
30,
31].