1. Introduction
The world economy has entered an important period of rapid development and globalization. As an important transportation hub connecting the sea and land, ports are a key link for each country to realize global international trade and construct a global logistics transportation system, international logistics, and supply chains. In view of the current situation of global economic development, ports play an important role in the development of the national economy. The operation of the port transport industry is a barometer of the country’s macroeconomy, and the fluctuation of port throughput reflects the development of world trade. Due to the high efficiency of container handling, fast turnover, packaging cost savings, and cargo loss reduction, the container-based transportation system has gradually become the main way of maritime transportation. The container is an important index to evaluate the comprehensive capability of a port. Accurate prediction of container throughput is of great significance to port planning, operation, and decision-making and regional economic development. On the other hand, container throughput is one of the important factors affecting port carbon emissions, and accurate prediction of container throughput plays an important role in studying port carbon emissions.
There are many scholars using a variety of different methods to predict container throughput. The commonly used prediction methods include traditional models (multiple regression, exponential smoothing, grey prediction, neural networks, combinatorial and autoregressive models, etc.) and machine learning techniques (support vector machines, random forests, deep learning algorithms, etc.). Although these methods have achieved certain predictive effects, they also have some limitations. For example, multiple linear regression models explore the linear relationship between variables based on fixed mathematical models. However, in many practical fields, especially in the economic field, there is often a nonlinear relationship between variables. The grey prediction model is prone to a rapid increase and decay and a lack of adaptive ability, so it is suitable for short-term prediction. The neural network model needs a large number of training samples. The autoregressive integrated moving-average (ARIMA) model can process linear time series data, but cannot capture nonlinear time series data. The kernel function selection of the support vector machine method is subjective. The combined models do not guarantee the size of the weights. Container throughput has the characteristics of randomness, volatility, and nonlinearity, which increase the difficulty of prediction. Therefore, it is urgent to find a simple, effective, and high-precision method to predict container throughput.
In recent years, the technological progress of computing tools and computational capacity has allowed us to deal with a large number of complex and high-dimensional data, which come from different practical fields. For example, the monthly data of container throughput, the daily data of temperature, the hourly data of electricity quantity, and the minute data of stock prices all have functional characteristics. These data are characterized as curves, shapes, and images, which are usually called functional data. Ref. [
1] proposed functional data analysis methods based on functional analysis techniques, the topology, and statistics, which are methods of processing functional data. The main idea is to regard the data observed once in the observation interval as a whole. Functional data form curves, surfaces, and images allow fully understanding the changes of the whole data, finding the change rules, mining the data information, and making data analysis more accurate. At present, functional data analysis has attracted much attention due to its wide application in many practical areas, such as biology, medical sciences, geophysics, meteorology, pattern recognition, and so on.
Container throughput data are a kind of nonlinear time series dataset. In order to achieve the purpose of prediction, the time series methods are very suitable. Time series are an active field in current statistical research because of their wide application in various practical fields. There are many time series methods, such as autoregressive, moving average, grey prediction, ARIMA, seasonal ARIMA models, and so on. However, no research results consider the functional characteristics of container throughput time series data. Recent research has attacked the functional time series problem by proposing either parametric (mainly linear) or nonparametric modeling. In order to overcome the constraint of the linear relationship between variables, in this paper, we propose nonparametric methods with more flexibility and applicability. In 1942, Wolfowitz first used the term nonparametric statistics, which mainly expanded the content of a parametric test so that the traditional test process could be applied to small samples and data of different distribution types. Nonparametric statistics is dedicated to solving the estimation and testing problems of unknown theoretical distributions and dealing with distribution-independent problems. Compared with parametric methods, nonparametric methods have wide applicability and robustness. The nonparametric approaches do not assume specific population distributions; they are applicable to data from any unknown population distribution and can describe more problems. Nonparametric methods have relatively few restrictions on the population distribution.
In order to take a high number of historical data into the time series model, functional data analysis is a reasonable method. The original time series was composed of 15 curves’ data, which are the measurements of container throughput in Shanghai Port during 15 years raging from 2007 to 2021. Hence, the time series can be viewed as a list of 12 functional data. We can use a single functional variable to forecast container throughput rather than 12 observed points. Hence, the prediction problem can be addressed through a nonparametric method with dependent functional time series. The aim of this paper was to develop several prediction models that combine the advantages of nonparametric methods with the merits of the functional idea to achieve the purpose of forecasting the yearly container throughput to compare the prediction accuracy with other classical methods. The results showed that the nonparametric functional data analysis methods outperformed other benchmark models.
2. Literature Review
Undoubtedly, an accurate port data analysis can improve a port’s operations and is helpful for the economic development of the port’s cities and the global economy. The investigations of container throughput forecasting have continued to deepen and provide a theoretical basis for the future development of ports. Classical regression models have traditionally been applied to forecast container throughput and can yield reasonably acceptable forecasting results. For example, Ref. [
2] considered the seasonal ARIMA model to forecast the monthly container throughput of the top 20 international container ports, and the results suggested that the seasonal ARIMA model can produce accurate and reliable throughput forecasting at major international ports. Ref. [
3] developed the grey model (GM) to forecast the port’s throughput, which generated more accurate results. Ref. [
4] applied the Fourier series to modify the GM to predict the cargo throughput of Kaohsing Port. Ref. [
5] collected the foreign trade data from 1996 to 2008 and constructed a back propagation-artificial neural network (BP-ANN) model to predict container throughput. Ref. [
6] proposed a hybrid method by combining projection pursuit regression and the genetic programming algorithm to forecast the container throughput of Qingdao Port. The results showed that the hybrid method significantly outperformed the ANN, seasonal ARIMA, and projection pursuit regression models. Ref. [
7] established the GM, triple exponential smoothing model, multiple linear regression model, and back propagation neural network models to forecast the container throughput of both Shanghai Port and Lianyungang Port. Ref. [
8] gave a comparison among some time series methods for forecasting container throughput. Ref. [
9] showed some univariate models to predict the container throughput of major ports in Asia.
Recently, machine learning approaches have allowed extracting hidden relationships from the real data and have become more and more important for economics data analysis. Many studies have been conducted to improve forecasting accuracy by using machine learning techniques. For example, Ref. [
10] proposed the support vector regression (SVR) model to consider potential influencing prediction to forecast container throughput. Ref. [
11] conducted several hybrid approaches by combining SVR with other methods to forecast container throughput. The results indicated that a good prediction depends on the seasonal nature and nonlinear characteristics of the historical data. Ref. [
12] employed a deep learning method to forecast the container throughput of Los Angeles Port. Ref. [
13] provided practical methods for forecasting the container throughout through the random forest and multilayer perception models. The results implied that the random forest model is a reasonable choice by comparison with seven competing models. Ref. [
14] considered a long short-term memory (LSTM) deep learning method and seasonal ARIMA model to forecast the comprehensive and route-based Shanghai Containerized Freight Index. The findings showed that the LSTM deep learning model outperformed the seasonal ARIMA model.
In the past two decades, functional data analysis has received considerable attention due to expressing each individual datum in repeatedly measured data as a smooth and continuous time processes and drawing information from functional data. There are a significant amount of works devoted to functional data analysis, especially in the fields of medicine, the environment, sports, economics, and so on. For example, Ref. [
15] treated the data related to the intraocular pressure of patients with right eye glaucoma by functional analysis and prediction based on smoothed functional principal component analysis (FPCA). Ref. [
16] proposed the ranking of functional data by several ranking methods and applied worldwide PM10 data to generate ranks. Ref. [
17] employed regularized optimization and functional principal component analysisfor prediction and estimation of sparse functional data and forecasting the behavior of basketball players. Ref. [
18] introduced several ordinal classification methods for multiargument and multivariate functional data. Their performance was analyzed on four real datasets of the 3D brain structure. Ref. [
19] explored the problems of detecting outliers in German electricity supply functions and classification of medical imaging data by partially observed functional data with an integrated functional depth.
Much statistical literature has focused on functional time series by proposing parametric and nonparametric modeling. Ref. [
20] considered the estimation and prediction of linear processes in the functional space. In order to reduce modeling biases caused by the misspecifying parametric models, there has been an upsurge in interest and effort in nonparametric modeling, which is helpful to explore hidden structures and reducing modeling biases. Nonparametric functional models have been investigated intensively in real data analysis. For example, Ref. [
21] studied the theoretical results of some kernel estimators of the model and how the procedure was well adapted to some spectrometric data. Ref. [
22] used the reproducing kernel Hilbert spaces framework to consider the functional nonparametric regression model. Ref. [
23] considered Gaussian processes methods for a nonparametric functional regression model and applied Leeds renal anemia data to confirm the usefulness of the method. The results showed that the proposed method outperformed the kernel method, functional additive model, penalized function-on-function regression, and functional linear model.
In order to import a large number of past values into a nonparametric time series model by applying the functional data idea, we need to cut the observed time series into one sample of trajectories and bring in the model one single past and continuous trajectory rather than many single past values. The literature on functional time series is quite extensive. For instance, Ref. [
24] estimated the conditional mode by maximizing a kernel estimator of conditional density for the response and almost achieved the convergence of the proposed estimator under
-mixing assumption. Ref. [
25] considered a semi-functional partial linear model for time series prediction. Ref. [
26] established the almost complete convergence and asymptotic normality of a nonparametric conditional mode estimator and illustrated the proposed method by using functional El Ni
o time series data. Ref. [
27] used a nonparametric learning approach based on the support vector machines technique to estimate functional quantiles and applied it to El Ni
o time series prediction. Ref. [
28] investigated
k-nearest-neighbor (kNN) estimation for a strong mixing functional time series model and established the uniform almost complete convergence rate of the proposed estimator. The finite sample performance and the usefulness of the kNN method were illustrated by an empirical application to a real data analysis of sea surface temperature. Ref. [
29] discussed kernel regression estimation for time series data with functional response and covariates. Ref. [
30] addressed the problem of nonparametric trend estimation for functional time series and applied the proposed method to annual mortality rates in France.
By summarizing the literature on forecasting container throughput, it can be found that the current research works are mainly concentrated on discrete time series and a linear assumption. There are few analyses of the functional characteristics of the time series dataset and a flexible structure. Therefore, the main contributions of this paper include two aspects: (1) We consider functional data analysis methods for the container throughput of Shanghai Port as continuous trajectoryof a single point, which finds more information from the functional time series dataset and produces more accurate predictions than nonfunctional methods. (2) Our research establishes a more flexible relationship between variables via nonparametric approaches to eliminate the linear constraint, which yields more accurate predictions and more extensive applications than parametric methods. Thanks to the advantages of functional data and nonparametric methods, we propose nonparametric functional data analysis approaches, namely the functional regression model, functional conditional quantiles model, and functional conditional mode model, to forecast the container throughput time series.
The remainder of this paper is organized as follows. We describe and preprocess the container throughput time series data of Shanghai Port from 2007 to 2021 in
Section 3. Three nonparametric functional data analysis methods and the corresponding kernel estimations are presented in
Section 4.
Section 5 compares the prediction accuracy of the nonparametric functional models with other benchmark models.
Section 6 gives the concluding comments and future research ideas.
3. Data
Shanghai Port is located in Shanghai, China; see
Figure 1 for the details. Shanghai Port gives full play to its advantages, supports the economic development of Shanghai, and realizes the positive interaction between the port and its cities. The container throughput of Shanghai Port reached 47 million TEUs on 1 January 2022, ranking first in the world for 12 consecutive years. This section focuses on an economic time series datasets. The Shanghai Port monthly container throughput covering a period from January 2007 to December 2021 (180 months) is plotted in
Figure 2. It can be seen that the container throughput has been seriously affected and fell sharply at the beginning of 2020 due to the outbreak of COVID-19.
It is obviously seen that the time series exhibits a linear trend and heterogeneity in the variance structure from
Figure 2. In order to eliminate these effects, we took a new time series by differentiating the log data. The transformed time series is displayed in
Figure 3.
In order to predict the future container throughput, the classical statistical methods take a finite number of historical data into account. It is reasonable to consider the continuous time series over some period as the explanatory variables. Hence, the explanatory variables were composed of 15 yearly continuous time series. The functional data are given in
Figure 4.
The first eigenspace both of the variables and units is given in
Figure 5. It is clear that the structure is visible, which confirms that the historical time series cannot be summarized by a small number of parameters. That is to say, it would be accurate if the whole of past years were used to predict the future values.
The descriptive statistic results are listed in
Table 1. Because the range of the original data is huge and has a clear linear trend, for the ease of calculation and correcting the non-normal distribution problem, as well as effectively eliminating the trend, we reduced the absolute value of the original data by using the natural differentiated logarithmic value.
The functional approach to time series forecasting consists of taking the historical explanatory data as a whole continuous path of the process. Without loss of generality, we assumed that
for some
and
. To clarify this, in the container throughput, we have
,
, and
. The observed time series can be cut into
n continuous paths of length
. Hence, we can build a new sample size of
in the following way:
where
is a real-valued time series, which has been observed
N equispaced times.
s is the horizon of prediction. The forecasting problem turns into a prediction question of the scalar response
given functional variable
.
6. Concluding Remarks
Container throughput time series data show strong functional characteristics with the passage of time. However, in the existing literature, only single past values ignore the continuity of the past values. In this research, we developed one new statistical method, nonparametric functional data analysis, to predict container throughput time series by taking into account one continuous set of past values as the predictors, which not only effectively avoided the limitation of the linearity assumption, but also could mine more data information and produce more accurate predictions.
A comparison among the FNR, FCQ, FCM, NR, FLR, and ARIMA models was considered and applied to forecast the container throughput time series of Shanghai Port. The models were proposed to forecast the container throughput of the 15th year, for which the historical dataset in a 14 year period is available. The forecasting accuracy results of the proposed models were presented in the form of the measurement criteria MSE, MAE, and MAPE. At the same time, the forecasting curves of all the prediction models were plotted. The conclusions suggested that: (1) functional data find more information than discrete points; (2) nonparametric methods are more widely applied and flexible than parametric methods; (3) functional time series analysis outperforms classical time series models. On the other hand, among these three nonparametric functional forecasting models, the best forecasting performance could be obtained by means of the FCQ model.
Due to the lack of scientific works focusing on forecasting container throughput by functional data analysis, the main contribution of this paper shows that the nonparametric functional models are reliable for forecasting container throughput. More accurate forecasting results help with management decisions.
It is interesting and challenging to study the main factors affecting the comprehensive level of ports by functional principal component analysis method, and classify ports based on the functional cluster analysis method, which will be used to compare with the traditional principal component analysis and cluster analysis. The goal is to classify ports more effectively and optimize port indicators more accurately.