## Abstract

^{2}score) of 0.89% was obtained, the maximum result being 0.95%.

## 1. Introduction

^{2}score) ranging range between 0.81 and 0.98, considerably higher when compared to the other models evaluated.

- (1)
- To the best of the authors’ knowledge, the first BERT model trained from scratch with solar irradiance data is introduced;
- (2)
- The implementation is evaluated for time series imputation in two scenarios, namely (1) the imputation of a single missing value at a specific position and (2) imputed a missing value where all values were missing after this position in the sequence.

## 2. Methodology

#### 2.1. Studied Model (BERT)

#### 2.2. Data Description

#### 2.3. Methodology

^{2}score, a statistical measure that indicates how well a model fits the observed data. The R

^{2}score is calculated using the actual value (${y}_{i}$) and the predicted value (${\widehat{y}}_{i}$) with Equation (1), where ${\stackrel{-}{y}}_{i}=\frac{1}{n}\sum _{i=1}^{n}{y}_{i}$.

## 3. Results and Discussion

#### 3.1. Scenario 1: Imputation of a Single Missing Value at a Specific Position

^{2}score in the set varied within the range of 0.83 to 0.95, as shown in Figure 3, with a mean of 0.89 and a variance of 0.13.

#### 3.2. Scenario 2: Imputation of a Missing Value after Several Unknown Values at a Random Position

^{2}score within the sequence increases as the number of unknown values decreases (as shown in Figure 4). The R

^{2}score varied within the range of 0.08 to 0.93, with a mean of 0.59 and a variance of 0.25.

## 4. Conclusions and Future Work

^{2}score, and the average performance was 0.89%, the best result being 0.95% for imputation on the final values of the sequence.

**Figure 1.**Representation of the BERT input for sequences of irradiances value—BERT’s input embedding are the sum of token embedding, segmentation embedding and position embedding (sum of the column). Notes: (1) Token embedding is represented in yellow and represent the value in another dimension space. (2) Segmentation embedding indicate which sentence it belongs to $\mathrm{A}$ or $\mathrm{B}$, while (3) the position embedding represents the position in the sequence. Abbreviations: $\mathrm{C}\mathrm{L}\mathrm{S}$ is used to identify the beginning of the sequence, ${\mathrm{E}}_{\mathrm{A}}$ is used to code the segment embedding, and ${\mathrm{E}}_{1},{\mathrm{E}}_{2}\dots {\mathrm{E}}_{\mathrm{n}+1}$ represent positions of the embedding.

**Figure 2.**Spatial distribution of stations capturing the data used in the experiments located in the Spanish regions of (

**a**) Castile and León (the CyL-GHI dataset), and (

**b**) Galicia.

**Figure 3.**Analysis of the coefficient of determination for the position of the MASK token within the sequence.

**Figure 4.**Analysis of the coefficient of determination for the position of the MASK token within the sequence with unknown values.

