A Transformer Model for Coastline Prediction in Weitou Bay, China

: The simulation and prediction of coastline changes are of great signiﬁcance for the development and scientiﬁc management of coastal zones. Coastline changes are difﬁcult to capture completely but appear signiﬁcantly periodic over a long time series. In this paper, the transformer model is used to learn the changing trend of the coastline so as to deduce the position of the coastline in the coming year. First, we use the distance regularization level set evolution (DRLSE) model for instantaneous waterline extraction (IWE) from preprocessed Landsat time-series images from 2010–2020 in Weitou Bay, China. Then, tidal correction (TC) is performed on the extracted instantaneous waterline dataset to obtain coastlines projected to a single reference tidal datum. Finally, the coastline datasets from 2010–2019 are used for model training, and the coastline in 2020 is used for accuracy assessment. Three precision evaluation methods, including receiver operating characteristic curve matching, the mean offset, and the root mean square error, were used to verify the predicted coastline data. The receiver operating characteristic curve was speciﬁcally designed and improved to evaluate the accuracy of the obtained coastline. Compared with the support vector regression (SVR) and long–short-term memory (LSTM) methods, the results showed that the coastline predicted by the transformer model was the closest to the accurate extracted coastline. The accuracies of the correct values corresponding to SVR, LSTM, and transformer models were 88.27%, 94.08%, and 98.80%, respectively, which indicated the accuracy of the coastline extraction results. Additionally, the mean offset and root mean square error were 0.32 pixels and 0.57 pixels, respectively. In addition, the experimental results showed that tidal correction is important for coastline prediction. Moreover, through ﬁeld investigations of coastlines, the predicted results obtained for natural coastlines were more accurate, while the predicted results were relatively poor for some artiﬁcial coastlines that were intensely inﬂuenced by human activities. This study shows that the transformer model can provide natural coastline changes for coastal management.


Introduction
A coastline is generally defined as the interface between sea and land, which is dynamic in nature [1,2].Coastlines are highly inherently uncertain and constantly changing, influencing the development of coastal areas and cities [3].At longer time scales, coastlines change with sediment supply, coastal transport, and long-term sea level changes [4,5].At shorter time scales, the factors that influence shoreline changes are tidal processes, wave formations, and storm surges [6,7].The changes in coastlines, especially the erosion of coastlines, often cause great economic losses in coastal areas, harm the ecological environment, and even lead to life safety problems.Therefore, the governments of all countries have always attached great importance to coastline changes [8,9].Predicting the future locations of coastlines is important for long-term planning and policies in coastal areas [10].
Several studies have been conducted on coastline forecasting, and the methods related to coastline changes are generally divided into three categories, namely, numerical models, physical models, and field measurement data analysis methods [11].A numerical model simulates coastline changes through mathematical functions to predict long-term coastline changes [12].The current numerical models include the numerical one-line theory model [13], empirical orthogonal function (EOF) [14], and GENEralized SImulating Shoreline change model (GENESIS) [12].These models have been applied to analyze coastline changes and predict future shoreline locations.Due to the dynamic and complex nature of coastal areas, the predicted shoreline locations resulting from numerical models always contain considerable uncertainty [15].Physical modeling involves the transfer of realistic coastal profiles to laboratory models through scale changes.For example, Adamo et al. [16] developed a model for predicting future coastlines by using water surface elevation, the directional wave spectrum, and historical coastline evolution information.However, the processes of data collection and model validation are time-consuming and laborious in physical modeling methods, and it is difficult to apply this type of method to long-term prediction scenarios [15].Field measurement data analysis is based on data characteristics and was recently recognized as a more efficient and reliable technique for studying coastline changes [17].For example, statistical models such as the end point rate (EPR) and linear regression (LR) models were used to calculate the change rates of coastlines for predicting future positions on Dr. Abdul Kalam Island in the Bay of Bengal, India [18].Ciritci and Türk [19] combined the EPR, linear regression rate (LRR), and weighted linear regression (WLR) methods with a Kalman filter to predict coastlines from long time series Landsat satellite images in the Gulf of Izmit and the Göksu Delta.In addition, a semiautomatic spatial uncertainty algorithm called shoreline prediction with spatial uncertainty mapping (SUP-SUM) was proposed for coastline detection and future prediction [20]; this approach mainly uses the snake algorithm to extract existing coastlines and uses LR to predict future coastlines on the coast of Kumluca, a dynamic coastal area in Turkey.Although an LR model is suitable for the prediction of coastline areas without human activities [17], the process of estimating coastline change rates includes a considerable degree of uncertainty.LR models heavily rely on linear assumptions and tend to ignore the effects of nonlinear trends.Therefore, these models have limitations in terms of accurately capturing the complex dynamics associated with coastline changes [21].
Recently, machine learning methods have obtained unprecedented advances in numerous fields, such as image classification and accurate prediction [22].The original support vector machine (SVM) [23] was proposed by Cortes and Vapnik to extensively address both linear and nonlinear problems and was developed with the aim of minimizing structural risk based on statistical learning theory, demonstrating excellent generalization capabilities.When employing an SVM for regression prediction, the method is commonly referred to as support vector regression (SVR).For example, Tuia et al. [24] proposed a multioutput SVR (M-SVR) method to estimate chlorophyll contents, leaf area indices, and vegetation coverage levels in remote sensing images.Most studies have revealed that the SVR and extended SVR approaches exhibit superior accuracies to those of some linear regression and polynomial regression methods [25][26][27].Therefore, the SVR method is explored in this paper for coastline prediction to explore its applicability as an extension.
Artificial neural networks (ANNs) were designed to replicate the human nervous system for training purposes and provide precise human-like activities; thus, they possess the ability to handle highly nonlinear data.However, the research and applications of neural networks in the field of coastline prediction are not extensive [28,29], and only a few methods have been proposed.These methods are relatively simple in the computer vision field, and their coastline prediction applications are still limited.Specifically, Zeinali, Dehghani, and Talebbeydokhti [15] used nonlinear autoregressive neural network (NAR-NET) and nonlinear autoregressive neural network with exogenous inputs (NARXNET) networks to model shoreline changes along the Narrabeen Coast, Australia, between 1980 and 2014.Compared with the radial basis function (RBF), generalized regression neural network (GRNN), and time delay neural network (TDNN) methods, the prediction results of the two new methods had higher accuracy and reliability while requiring fewer data.Pazou and Agbodoyetin [30] used the autoregressive integrated moving average (ARIMA) and long-short-term memory (LSTM) methods to study the process of coastline evolution in the coastal area of Akpakpa and obtained the location of the coastline through a database composed of satellite images.The results showed that the ARIMA and LSTM techniques are both suitable for short-term coastline prediction in Akpakpa.In addition, YIN, ANH, and MAI [31] used NNAR and LSTM methods to predict coastline changes in surveillance camera images, and the LSTM model was employed to overcome the gradient disappearance and explosion problems of recurrent neural networks (RNNs).In conclusion, these models are effective at detecting coastline changes but still lack long-term time-series data learning dependencies for coastline prediction.
A transformer is a prediction model based solely on attention mechanisms [32].The method can be effectively applied to long-term or short-term prediction tasks and can consider the long-term dependencies among time-series data [33].Many studies have used transformer models to predict time-series data, and the results were proven to be highly precise by accuracy tests.For example, Wu, Green, and Ben [34] established a new timeseries prediction method based on a transformer model and applied it to the prediction of influenza-like illness (ILI).The study revealed that transformers are better than the ARIMA and LSTM models as they can learn complex dependencies of various lengths from time-series data.Cai and Janowicz.[35] proposed a traffic transformer architecture for traffic prediction and demonstrated that their method was superior to the existing methods in terms of accuracy and stability by evaluating the prediction accuracy achieved on two real-world traffic datasets.Zhang et al. [36] built a pure transformer model for four remote sensing datasets to detect image changes, and it could effectively extract global information to obtain the long-term dependencies between time-series data.In addition, Chen, Qi, and Shi [37] considered the relationships between pixels in remote sensing images and modeled context in the spatial-temporal domain based on a transformer to improve the efficiency and accuracy of their change detection results.A transformer model can solve the difficulty of capturing long-term dependencies and learn the global information contained in time-series data more effectively; therefore, it can serve as a good choice for predicting future coastline changes.
This paper aims to (1) explore the application of a transformer model in predicting coastlines and compare its results with those of SVR and LSTM after identifying potentially suitable methods for delineating coastline locations and (2) analyze the primary factors influencing the accuracy of coastline prediction while evaluating the ability of the transformer model to predict the future location of the coastline.The findings of this research provide valuable technical support for the scientific management of coastlines.

Materials 2.1. Study Areas
Weitou Bay is selected as our study area (Figure 1).The area is located in southeastern Quanzhou City, a famous coastal city of Fujian Province in China [38].The study area falls under a subtropical monsoon climate, close to the Taiwan Strait, where the tidal range has a large spatial variation, with a mean tidal range of approximately 4 m [39].In addition, the semi-diurnal tide is dominant in the Weitou Bay area [40], and the tidal state at the time of image acquisition is shown in Table 1.The coastline in this region can be categorized into two types: the natural coastline, unaffected by human development, and the artificial coastline, which is affected by human activities [41,42].The natural coastlines in the study area include bedrock coastlines, sandy coastlines, and silty coastlines [43], while the coastlines dominated by artificial dikes are taken as artificial coastlines [44].The coastline types are shown in Figure 2.

Satellite Image Download
A total of 21 Landsat-5 Thematic Mapper (TM) images, Landsat-7 Enhanced matic Mapper Plus (ETM+), and Landsat-8 Operational Land Imager (OLI) images co ing the period from 2010 to 2020 were acquired from the USGS arch (https://www.usgs.gov/programs/usgs-library/collections)accessed on 21 October 2 due to their low cloud coverage levels, as shown in Table 1.In addition, the datum, jection, resolution, path/row, and file type of each image are WGS84, UTM Zone 50 N × 30 m, 119/43, and TIFF, respectively.Two images, encompassing one from the first q ter and another from the third quarter, are selected annually.Moreover, the Landsat t series images for the first and third quarters are separated by an approximate interv six months.This temporal spacing is aligned with the typical capability of satellite-der coastlines to effectively resolve the observed coastline variance in coastal features oc ring over a time scale of six months or longer [50].The images are in the L1TP fo (radiometrically, geometrically, and topographically corrected) with reliable locatio curacy.The first 20 scenes are used for model training, while the last scene is used testing and accuracy evaluation purposes.

Data Preprocessing
As all Landsat images are from the USGS archive in the L1TP format with rel location accuracy to support time-series analyses, image registration is not condu However, simulations have shown that atmospheric conditions, including water v concentrations and changes in optical thickness due to aerosols, play important roles can result in digital number (DN) values [51].Therefore, after using radiometric cal tion to convert the DN values to radiances to compensate for atmospheric effects, fast of-sight atmospheric analysis of spectral hypercubes (FLAASH), a MODTRAN4-base mospheric correction algorithm provided by ENVI software [52,53], is performed o Landsat images.
To reduce the effects of clouds, including thin cirrus clouds and the thin edg clouds, as well as their shadows, the Fmask algorithm (version 3.2), which was develo for cloud, cloud shadow, and snow detection in Landsat 4-7, 8, and Sentinel-2 imag applied to each individual image [54,55].Quanzhou not only has a glorious history of two thousand years and is claimed to be one of the greatest ports in the world [45] but with the establishment of the national Belt and Road strategy, it has become the birthplace of the 21st century Maritime Silk Road, and its construction will have a great impact on the change exhibited by coastlines [46].Quanzhou's "Maritime Silk Road" was built into a characteristic cultural tourism product [47], and most religious sites are located along the coast [48].In addition, part of the natural coastline was turned into an aquaculture shoreline through reclamation, which has changed the curvature and length of the shoreline [49].These construction plans and activities are the main reasons for the observed coastline changes.

Satellite Image Download
A total of 21 Landsat-5 Thematic Mapper (TM) images, Landsat-7 Enhanced Thematic Mapper Plus (ETM+), and Landsat-8 Operational Land Imager (OLI) images covering the period from 2010 to 2020 were acquired from the USGS archives (https://www.usgs.gov/programs/usgs-library/collections) accessed on 21 October 2021.due to their low cloud coverage levels, as shown in Table 1.In addition, the datum, projection, resolution, path/row, and file type of each image are WGS84, UTM Zone 50 N, 30 × 30 m, 119/43, and TIFF, respectively.Two images, encompassing one from the first quarter and another from the third quarter, are selected annually.Moreover, the Landsat time-series images for the first and third quarters are separated by an approximate interval of six months.This temporal spacing is aligned with the typical capability of satellite-derived coastlines to effectively resolve the observed coastline variance in coastal features occurring over a time scale of six months or longer [50].The images are in the L1TP format (radiometrically, geometrically, and topographically corrected) with reliable location accuracy.The first 20 scenes are used for model training, while the last scene is used for testing and accuracy evaluation purposes.

Data Preprocessing
As all Landsat images are from the USGS archive in the L1TP format with reliable location accuracy to support time-series analyses, image registration is not conducted.However, simulations have shown that atmospheric conditions, including water vapor concentrations and changes in optical thickness due to aerosols, play important roles and can result in digital number (DN) values [51].Therefore, after using radiometric calibration to convert the DN values to radiances to compensate for atmospheric effects, fast lineof-sight atmospheric analysis of spectral hypercubes (FLAASH), a MODTRAN4-based atmospheric correction algorithm provided by ENVI software [52,53], is performed on all Landsat images.
To reduce the effects of clouds, including thin cirrus clouds and the thin edges of clouds, as well as their shadows, the Fmask algorithm (version 3.2), which was developed for cloud, cloud shadow, and snow detection in Landsat 4-7, 8, and Sentinel-2 images, is applied to each individual image [54,55].
To remove the effects of water bodies and improve the efficiency of the calculation process, the water bodies are extracted and masked off in each image using the modified normalized difference water index (MNDWI) according to the following equation [56]: where ρ green is the reflection of a green channel such as TM band 2 or OLI band 3; ρ mir is the reflection of a middle-infrared channel such as TM band 5 or OLI band 6, and MNDWI is the resulting MNDWI value.

Method
The study workflow is shown in Figure 3.After performing image preprocessing, the instantaneous waterline in each remotely sensed image is extracted using an active contour model [57].As tide-level data are also important factors for predicting wave-dominated coastal areas [58], the tide-level data from the Weitou tidal station are then applied to adjust the instantaneous waterline to a coastline.This method is called tide correction.After completing data normalization, the acquired sample data are split into a training set and a test set.The training set serves the purpose of training the transformer model and fine-tuning the parameters to attain the optimal hyperparameters.Subsequently, the trained model is employed to predict outcomes using the test set as its input.Finally, the accuracy of the prediction results is assessed by comparing them with the accurate reference coastlines using three evaluation methods: the root mean square error (RMSE), receiver operating characteristic (ROC) curve-matching principle, and mean offset.

Instantaneous Waterline Extraction
In our study, the distance regularization level set evolution (DRLSE) model [59] is used to extract the instantaneous waterline via an energy functional formula: where p(•) is the double-well potential function, δ(•) is the Dirac delta function, and H(•) is the Heaviside function.Additionally, g is the edge indicator function, and µ, λ > 0, and α ∈ R represent the constant coefficients of each energy term.The expression of the double-well potential function p(•) is defined as follows: The edge indicator function g is defined as follows: where G δ is the Gaussian kernel of the parameter δ, and I is the image function.In Equation ( 2), the first term of the energy functional represents the penalty energy term, which is commonly referred to as the distance regularization term.The inclusion of the distance regularization term effectively mitigates the irregular motion of the active contour during the evolution process, eliminating the need for repeated initialization and significantly enhancing the efficiency of image segmentation.The numerical solution of this model is based on the finite difference method, and the time step ∆t must satisfy the convergence condition of the Courant-Friedrichs-Lewy (CFL) approach [60].Toure et al. employed the DRLSE model to extract coastlines with specific experimental parameters [61], including µ = 0.2, λ = 5, α = 3, and ∆t = 1.In this study, we adopt identical parameter settings in the experiments to maintain consistency.Initially, the DRLSE model is employed to obtain the instantaneous waterline, which is gradually derived from the initially defined active rectangular outline when approaching the feature boundary.At this stage, the extracted instantaneous waterline effectively demarcates the boundary of the land and water area, but some subtle deviations remain.Subsequently, the acquired edge contour undergoes refinement through manual visual interpretation, resulting in a more refined instantaneous waterline.

Tidal Correction
The extracted remote sensing image is the instantaneous waterline, which is intensively affected by the tide and must be considered in long-term coastline predictions.The tidal correction is based on the relationship between beach topography and tidal level changes, using the tidal height, tidal datum, and beach slope at the time of satellite imaging to calculate the horizontal distance from the instantaneous water boundary to the high tide line, so as to infer the location of the coastline.The tidal correction method assumes a constant slope across the coastline correction range, and the horizontal shift between instantaneous waterlines [50] is computed by the following formula: where ∆x is the cross-shore horizontal shift, z ref is the reference tidal datum, z tide is the tidal elevation at the satellite image acquisition time, and m is a characteristic beach-face slope.
The reference tidal datum and the tidal elevation are collected from the China National Seamen Service website (https://www.cnss.com.cn/tide/)accessed on 21 November 2021.
The tidal datum at Weitou Bay is 322 cm below mean sea level.In this paper, the tidallevel data of the Tide Station of Weitou Bay located in the study area from 2010 to 2020 are collected in Table 1.m is calculated from two instantaneous waterlines extracted at different times: where z 1 and z 2 are the tidal elevations at the times when the satellite images are acquired, and d is the average distance between the two extracted instantaneous waterlines.

Transformer Model
The overall architecture of the transformer model is depicted in Figure 4.The transformer model mainly consists of input, encoder block, decoder block, and output components, among which the encoder and decoder blocks are two critical parts [62].First, we denote the time-series images as X = {x 1 , x 2 , . . . ,x t }, where each x i ∈ Z h×w corresponds to an image matrix at the ith time stamp.Here, t represents the total number of time-series images, while h and w represent the numbers of rows and columns in each image, respectively.Specifically, the pixel values in the training images are only 0 and 255 in the coastline prediction application investigated in this study, where 0 represents the background (non-coastline area) and 255 signifies a coastline.Second, considering the self-correlation of the multihead self-attention layer in the obtained transformer model, we define the 0 values in the image matrix as infinitesimal nonzero values so that the self-correlation attention layers are effective and meaningful when the dot product operations are executed.The processed nonzero time-series image matrix is denoted as X = x 1 , x 2 , . . ., x t .Third, we reshape each image matrix x i into a one-dimensional vector h i , where h i = p 1 i , p 2 i , . . ., p n i .n denotes the number of pixels in each image matrix, which equals h * w.Fourth, the time-series sequence H = h 1 , h 2 , . . ., h t is divided into training samples and prediction samples, which undergo positional encoding and are fed into the encoder blocks and decoder blocks, respectively.
The encoder block includes a collection of encoders and is typically set to six layers [32].Each encoder consists of a multihead self-attention sublayer and a fully connected feedforward sublayer.The multihead self-attention sublayer obtains the attention weight by calculating the similarity between the query and the key, which is essentially a variant of the input data.Then, the attention value can be computed as the weighted sum of their dot product.The fully connected feedforward sublayer contains two linear transformations and a rectified linear unit (ReLU) activation function.Residual concatenation is employed surrounding each sublayer, followed by data normalization.The output of the final encoder block is a continuous time-series sequence, which accounts for the self-correlation characteristics of the input sequence.The encoded data serve as an interactive input for the subsequent decoder blocks, facilitating the model in generating the predicted results.
Similarly, the decoder block also includes six decoders.In contrast to each encoder layer, each decoder incorporates an additional sublayer, enabling the interactive learning of multiple self-attention operations with the output of the encoder.In addition, the multihead self-attention mechanism of the encoder employs a mask to prevent the timeseries data prediction process from learning future information.Finally, the decoder output is subjected to a linear transformation layer and a softmax layer, and the prediction results are derived by processing the matrix transformation by referring to the reverse process of the input layer.
Furthermore, the similarities between the queries and keys are calculated based on their pointwise values in the masked multihead attention layer of the canonical transformer, which results in abnormal observations if the local context is not fully utilized.Therefore, convolutional self-attention is used to mitigate this problem [63].This model can better understand the local context by converting inputs into queries and keys using a causal convolution with a kernel size of k, a stride of size 1.Calculating their similarity through their local context information is helpful for obtaining accurate predictions.

Redefined Image Format
To binarize the extracted coastline image for coastline extraction purposes, we assign a value of 255 to coastline pixels and a value of 0 to non-coastline pixels.A total of 20 images acquired from 2010 to 2019 are read to obtain an image pixel value matrix.Each pixel value matrix is mapped to a vector with 124,956 rows and 1 column, and a matrix with 124,956 rows and 20 columns is obtained after the combination operation.After determining the relative optimal hyperparameters, such as the time-step size, number of epochs, and causal convolutions with a kernel size of k, a matrix with 124,956 rows and 20 columns is input into the transformer model for training.The trained network is used to predict and output a vector with 124,956 rows and 1 column based on the coastline images of the previous years.Then, the pixel value matrix of the coastline image in the first quarter of 2020 is obtained through inverse mapping.Finally, the coastline prediction image for that year is output.

Training Strategies of SVR, LSTM, and Transformer Methods
In this paper, the transformer model is compared with a machine learning method SVR and a deep learning method LSTM, which have been proven to be robust coastline prediction methods [31,64].
Specifically, we employ the coastline position images extracted from Landsat timeseries images from 2010 to 2019 as the training data for the three methods and the image from the first quarter of 2020 as the test data.The image of the coastline location in the first quarter of 2010 is denoted L q 1 2010 , where q 1 represents the first quarter.In practice, we trained the model to predict 1 future coastline image from 10 coastline data images of 5 years in one typical training setup.That is, the input of the model consisted of coastline position images from the first quarter of year 2010 to the third quarter of 2014, i.e., L 2019 are used to output the predicted coastlines L q 3 2015 − L q 1 2020 , which is regarded as the test process, and the precision indexes of the predicted result L q 1 2020 are evaluated with the real coastline.
We conduct the SVR algorithm and perform hyperparameter optimization by the utilization of the sklearn and skopt packages in Python.Moreover, the LSTM model is configured with three hidden layers and an output layer.The LSTM model and the transformer model are implemented in the PyTorch toolbox with an NVIDIA GeForce RTX 2080 Ti GPU.The experiment is independently developed and realized based on the Pycharm platform and Python on a Windows 10 system.

Precision Validation
To evaluate the accuracy of the predicted coastline, this paper mainly adopts three evaluation indices to evaluate the results, including the ROC curve-matching principle, mean offset, and RMSE.Three indices are used to verify the accuracy between the real coastline (IWE + TC) and predicted coastline images.

ROC Curve-Matching Principle
The ROC curve-matching principle, based on the spatial relationships along the coastline and not only considering the characteristics of the coastline but also matching a linear target to evaluate the accuracy of the coastline, is proposed [65].This method first defines the accurately extracted coastline as a real coastline and then establishes a buffer zone with a buffer radius of n pixel units based on the real coastline.Thus, the real coastline and the predicted coastline are superimposed and analyzed.As shown in Figure 5, the true-positive (TP 1 ) represents the length of the predicted coastline that was successfully matched within the established buffer, and the false-positive (FP) represents the length of the coastline that was not successfully matched.Similarly, a pixel-based buffer is built around the predicted coastline and subsequent coverage analysis is performed.In this case, TP 2 represents the actual coastline length that was successfully matched in the buffer, while the false-negative (FN) represents the unmatched coastline length in the buffer.Finally, the following parameters are used to evaluate the model accuracy: where Complete represents the integrity of the coastline extraction results, Correct represents the accuracy of the coastline extraction results, and Quality is used to evaluate the extraction quality of the obtained coastline through the integration of correctness and integrity.The coastline obtained by instantaneous waterline extraction and tidal correction (IWE + TC) is used as the accurate reference coastline, and the coastline predicted by the transformer model is used as the coastline to be tested.The number of pixels that fall into the buffer zone and the number of pixels that do not fall into the buffer zone are used to replace the coastline length for the accuracy verification.

Mean Offset and Root Mean Square Error
The mean offset and RMSE are commonly used in coastline accuracy verifications [66]: where n is the total number of sample points, and D is the Euclidean distance from the coastline feature points in the reference coastline dataset to the extracted coastline.We calculate the Euclidean distances from the predicted coastline pixels to the accurate coastline pixels.Pixels with distances of less than one pixel are classified as successfully predicted pixels; conversely, pixels with distances larger than one pixel are classified as failed pixels.The ROC matching principle is adopted to establish a buffer area based on coastline extraction.The coastline obtained by IWE + TC is used as the accurate coastline, and the coastlines predicted by the three methods are used as the coastlines to be evaluated.
According to the coastline images extracted and predicted in the first quarter of 2020, the accuracy of the prediction results is evaluated by the ROC curve-matching principle, mean offset, and RMSE, and an accuracy analysis is carried out in combination with the coastline type and tidal correction.

Coastline Extraction
Figure 6 illustrates the instantaneous waterlines obtained by executing the DRLSE model and manual correction on the given remote sensing image, followed by the derivation of the coastline through tide correction.The accuracy of the coastline obtained by the IWE + TC method is verified through a visual inspection.The instantaneous waterline is represented in blue, while the coastline, corrected for tidal influences, is denoted by the orange curve.Additionally, a partial enlargement is provided in the lower-right corner.

Predicted Coastline
Figure 7 depicts the predicted results obtained for the transformer, SVR, and LSTM coastlines in the first quarter of 2020.Each predicted coastline image is compared with the real coastline, and the overlapping areas are the parts with good prediction effects.In contrast, the prediction results contain errors.In addition, to further analyze the details of the predicted coastline results, three typical regions are selected in Figure 7, and their overlaps with the accurate coastline are shown in Figure 8.The coastline predicted by the LSTM model in Region 1 does not coincide with the real coastline according to the overlap comparison between the coastline results predicted by the three models and the real coastline, while the results of the transformer model and SVR model basically coincide with the real coastline.Region 2 indicates that the coastlines predicted by the three models are approximately located in the same position, and the coastline predicted by the transformer model is closest to the real coastline.Specifically, the distance of the prediction error is measured by vertical lines from the real coastline to the intersection of the predicted coastline of the three methods.We select the point where the disparity is most pronounced as a sample in Region 2 of Figure 7 (represented by a black dotted line), which reflects that the discrepancies between the coastlines predicted by the Transformer, LSTM, and SVR methods, compared to the real coastline, are 55.34 m, 77.48 m, and 78.01 m, respectively.These differences all exceed one pixel (30 m).In Region 3, the coastlines predicted by the three models all coincide well with the real coastline within one pixel, which indicates that these models have the ability and potential to predict future coastlines.

Accuracy Evaluation
Table 2 shows the accuracy of the results predicted by the three methods through the ROC curve-matching principle, mean offset, and RMSE metrics.We individually analyzed the pixels that represent the coastlines in the resulting images obtained by prediction.For the Weitou Bay area, the correct, complete and quality values of the coastline predicted by the transformer model for the first quarter of 2020 are 98.80%, 96.40%, and 95.24%, respectively.All pixels of the predicted coastline are considered sample points, and the calculated mean offset and RMSE are 0.32 pixels and 0.57 pixels, respectively, as shown in Table 2.
SVR and LSTM are used for comparison with the transformer model.The precision evaluations of the coastlines predicted for the first quarter of 2020 by the SVR and LSTM models are shown in Table 2.The quality of the coastline predicted by SVR is 76.57%, which is 18.67% lower than that of the result predicted by the transformer model.The RMSE of the SVR method is 0.85 pixels, which is 0.28 pixels higher than that of the transformer.Similarly, the quality of the coastline predicted by the LSTM method is 9.15% lower than that predicted by the transformer method, and its RMSE is 0.22 pixels higher than that predicted by the transformer method.

Discussions
Our study has demonstrated that the predictions of the three models generally coincide with the real coastline extracted by the IWE + TC method in most areas of the Weitou Bay region, and the result of the transformer model exhibits the closest agreement with the real coastline in some challenging regions, which is better than the performance of the LSTM and SVR models.
In recent years, ecological security issues such as the lack of ecosystem services in the coastal zone of Weitou Bay area have attracted great attention due to rapid urbanization and industrialization [67].These human activities also affect the changes in the coastline in the Weitou Bay area.Most of the research on coastline prediction has mainly relied on traditional techniques such as numerical models, physical models, and field measurement data analysis methods, while the use of deep learning methods remains relatively limited.As one of the state-of-the-art deep learning models, the transformer model gives more accurate prediction results concerning the change trend of the coastline in the Weitou Bay area, which can be further used in the environmental protection against beach loss [68,69], the maintenance of coastal disaster warning systems [70], as well as the management of coastal land [71,72].
Coastline dynamics are frequently affected by tidal correction and coastline type [73,74].Tidal variation and the elevation of the tidal flat have great influence on the precision of coastline location estimation and the historical changes in coastline.Additionally, the type of coastline and the extent of land-use change are also intertwined with the accuracy of coastline prediction, which directly affects the coastline stability.The effects of these two factors on the prediction results are discussed in the following section.

The Effect of Tidal Correction
Tidal correction plays an important role in coastline prediction.To demonstrate the key role of tidal correction in coastline prediction, an ablation experiment is conducted.The experiment used coastal data, both before and after tidal correction, to perform predictions and evaluations of the transformer model to obtain the coastline for the coming year.The precision values are shown in Table 3, indicating the obvious improvements in results when tidal correction is applied.

Effects of Coastline Types on Prediction
Figure 9 shows the deviation of the predicted coastline image for the first quarter of 2020 from the sample points of the exact coastline for the first quarter of 2020.The y-axis (m) denotes the absolute error, and the x-axis denotes the sample points.The coastline sample points are sampled at 200-m intervals, and the Euclidean distances from these sample points to the predicted coastline points are calculated.Figure 10 shows the distribution of coastline types in Weitou Bay in 2017 through Gaofen-2 images (the coastline types in the Weitou Bay area in 2020 were the same as those in 2017).In Figure 9, the coastline prediction results clearly reflect that the prediction results of the transformer model are better than those of the SVR model and LSTM model, and larger absolute errors are induced by the results obtained for the 24th sample and the 104th sample.The 24th sample point is located in a sandy coastline area, and the fluctuation of this coastline type is relatively stable.Because the image used in the experiment corresponds to 30 m per pixel, the errors of the prediction results yielded by the SVR and transformer methods can be equivalent to less than one pixel, and the error of the LSTM model is more than three pixels.Therefore, the LSTM method is insufficient for accurately predicting this area.The error value of the 104th sample point shows a sudden change when the coastline prediction results of the three prediction methods all exceed one pixel in this sample point.Specifically, we find that this sample point is located within the confines of a shipping company, which belongs to an artificial coastline area.In addition, Table 4 shows the RMSEs produced for the four coastline types obtained by the three prediction methods.The RMSE of artificial shorelines is larger than those of bedrock coastlines, sandy coastlines, and silt coastlines.In summary, artificial coastline areas are often subject to frequent human-made reconstructions and are greatly affected by human activities.Many anthropogenic interferences are predicted, and the results cannot be effectively obtained.

Conclusions
This paper collects Landsat-8, Landsat-7 and Landsat-5 satellite data from the Weitou Bay area of Quanzhou city, Fujian Province, from the first and third quarters of 2010 to 2020 biannually.After preprocessing the satellite images with ENVI, the instantaneous waterline of the first and third quarters each year is extracted.Moreover, the tide-level data acquired from the Weitou Bay tide station are used to correct the extracted coastline data to obtain the real coastline position data for the prediction reference.Finally, the pixel-by-pixel predictions of three models are used to deduce the coastline position for the Weitou Bay area in the first quarter of 2020.The transformer model used in this paper performs better prediction accuracy compared with the SVR and LSTM models.In an ROC curve accuracy verification, the complete, correct and quality values of the transformer method are 98.80%, 96.40%, and 95.24%, respectively, which are all above 90%, and the mean offset and RMSE are 0.32 pixels and 0.57 pixels.In addition, the necessity of tide correction is verified from the comparative ablation experiment.Furthermore, different prediction effects for different types of coastlines in the Weitou Bay area are discussed by transformer method.For natural coastlines, including sandy coastlines, silt coastlines, and bedrock coastlines, the coastline prediction results have high accuracy and strong continuity.For artificial coastlines, including dams, wharves, and reclamation areas, some of the prediction results are inferior due to the influence of human activities.
The results obtained are not only crucial for analyzing coastline changes in the Weitou Bay area but also can be further applied for broader applications in other coastal regions.The exploration of methods to predict the location of unknown coastlines is of great significance for environmental protection departments and land planning departments to manage land use and coastline development in coastal regions.However, there are still some limitations, including the challenges of (1) optimizing the transformer model to obtain optimal hyperparameters, necessitating the use of more efficient hyperparameter optimization methods; (2) ensuring the retention of spatial information during the process of mapping the image matrix to a vector; and (3) addressing issues related to data collection and quality, which are influenced by the limited satellite data and changes in coastal climate conditions.
In the future, the modelling and matrix transformation methods can be improved while preserving spatial information by using better hyperparameter optimization methods to obtain better hyperparameter combinations and achieve improved prediction efficiency.Simultaneously, we also intend to augment the generalization capability and robustness of the transformer model by incorporating an extended temporal range and broader geographical scope from additional remote sensing time-series data.

Figure 1 .
Figure 1.The map of the Weitou Bay study area in Quanzhou City, which is highlighted in a dotted rectangular box.The red point in the lower-left corner is the tide station of Weitou Bay.

Figure 2 .
Figure 2. Coastline types in the Weitou Bay area.These images were acquired by our research g during a field collection in the year 2020.

Figure 2 .
Figure 2. Coastline types in the Weitou Bay area.These images were acquired by our research group during a field collection in the year 2020.

Figure 3 .
Figure 3.The proposed workflow chart for coastline prediction.

Figure 4 .
Figure 4.The architecture of the transformer model.
and the output of the model aimed to obtain coastline results L

Figure 6 .
Figure 6.The extraction outcomes of the instantaneous waterline, the corrected coastline results, and the localized amplification area.

Figure 7 .
Figure 7. Predicted coastlines and IWE + TC (real coastline) for the first quarter of 2020.Three typical regions are selected: Region 1, Region 2, and Region 3.

Figure 8 .
Figure 8.The overlapping maps of the coastline results predicted by the three models with the real coastlines in three typical regions.The SVR, LSTM, and transformer results are shown from left to right, and Region 1, Region 2, and Region 3 are shown from top to bottom.In these figures, black lines represent the real coastlines, and blue, green and red lines represent the coastline predicted by SVR, LSTM and transformer methods respectively.

Figure 9 .
Figure 9. Deviation of the predicted coastlines in 2020 from the accurate coastline in 2020.

Figure 10 .
Figure 10.The coastline type distribution is derived from Gaofen-2 images.

Table 2 .
The results of the precision evaluation.

Table 3 .
Comparison between the accuracy values achieved before and after tidal correction.

Table 4 .
RMSEs produced for four coastline types.