GMDH-based hybrid model for container throughput forecasting: Selective combination forecasting in nonlinear subseries
Graphical abstract
Introduction
With the deepening of economic globalization and the frequent exchanges of international trade, container transportation has played an important role in reducing transportation time and trade cost. Therefore, major ports in the world have striven to develop containerization. The World Bank statistics has shown that the global container throughput in 2014 reached 679.2647 million twenty-foot equivalent units (TEU), which has increased by 80% since 2005. With the upsurge of container transport development, the booming construction of ports resulted in a number of problems, such as overcapacity and declined throughput capacity utilization [29]. Container port construction requires a long period and large investment funds. Once over-construction occurs, some issues, such as the excess of capacity and failure to gain the expected profit, will lead to a huge waste of time and capital [51]. Therefore, it is very important to predict the future container throughput for adjusting the port development direction, making port operation schedules, planning the port scale, and reducing resource waste [9].
The research on container throughput forecasting began in the 1980s and has considerably developed. Existing forecasting methods mainly include single models and hybrid approaches. The single models use only one model for prediction and can be divided into three types [50].
(1) Time series models include exponential smoothing (ES), autoregressive integrated moving average (ARIMA) [1], [38], seasonal autoregressive integrated moving average (SARIMA) [41], [31], [14], vector autoregressive (VAR) [45], decomposition approach [35], and grey forecasting (GM) [7], [16], [28]. For example, Schulz and Prinz [41] forecasted the quarterly container throughput series in Germany using the SARIMA and Holt–Winters ES models, in which the former performed slightly better than the latter. Syafi’i et al. [45] proposed a vector error correction model to forecast container throughput based on the VAR model and verified good forecasting performance. Guo, et al. [16] proposed an improved gray Verhulst model and overcame the deficiency of the increasing forecasting error of the GM (1,1) model as the container throughput grows in the S-curve.
(2) Causal analysis models [42], [10], [34] include regression and elasticity coefficient analysis. For instance, Chou, et al. [10] proposed an improved regression model to predict the container throughput of Taiwan Port; the model performed better than the traditional regression model. Patil and Sahu [34] used a regression model for predicting the freight flow at Mumbai Port and implemented an elasticity analysis to identify the impact factors of freight flow.
(3) Nonlinear dynamic forecasting models include artificial neural network [15], [26], [43], [37], [25], genetic programming (GP) [8], and support vector regression (SVR) [30]. For example, Gosasang, et al. [15] achieved better results in predicting the container throughput of Bangkok Port by using a multilayer perceptron neural network than a linear regression model. Chen and Chen [8] adopted the GP, X-11, and SARIMA models to predict the container throughput of major ports in Taiwan, and the comparison results showed that the GP model performed best. Mak and Yang [30] proposed an approximate least squares support vector machine (ALSSVM) model to predict the container throughput of the Hong Kong Port and compared with support vector machine (SVM), least squares SVM, and radial basis function neural network, which showed that the ALSSVM model was the best overall.
The container throughput time series is usually complex; thus, a single model based on linear assumptions or a nonlinear dynamic model often cannot obtain satisfactory forecasting performance [39], [21], [2]. An increasing number of researchers have constructed hybrid forecasting models [17], [13], [51], [52], [9], [53] to solve this problem. For example, Xie et al. [52] proposed three hybrid models based on the least squares support vector regression (LSSVR). The empirical analysis based on the container throughput time series of Shanghai and Shenzhen Ports showed that the proposed hybrid models performed better than the single models.
The above studies have paid significant attention to hybrid models because their prediction performance is usually better than that of the single models. An efficient hybrid forecasting model may be constructed in three ways: (1) The embedded method embeds one model into another (i.e., optimizes the parameters of one model with another, such as the References [17], [13]); (2) The divide and rule method, which first decomposes the original time series into several subseries, and then models and predicts each subseries by an appropriate model, finally integrates the prediction results according to certain rules, such as the References [51], [52]. This method is popular. However, given that some of the subseries decomposed from the original time series are often highly nonlinear, accurately forecasting for those subseries using a single nonlinear model is difficult, and a large prediction error on some subseries leads to a poor prediction performance in the entire time series. (3) Combination forecasting method adopts several models to predict the original time series and assigns a weight to the forecasting results of each model in order to obtain the final combined results, such as Refs. [9], [53]. This method is convenient and simple; however, it usually allows the combination of all trained models, and multi-collinearity among the models may exist, which will degrade the forecasting accuracy of the model. Forecasting performance can be improved by selecting and combining the forecasting results of a subset of the models for a final decision, but the manner in which an optimal subset is derived still remains a challenge.
This study combines the latter two hybrid forecasting methods above and proposes a hybrid forecasting model based on group method of data handling (GMDH) [20], called HFMG. This model first decomposes the original time series into two parts: linear trend and nonlinear variation. The linear trend is the increase of the container throughput caused by long-term economic growth and predicted by the SARIMA model. The nonlinear variation mainly comes from the irregular changes and shocks of various factors in the economic system. Considering the complexity of nonlinear variation, the model trains three nonlinear single models, namely, SVR, BP neural network, and GP, and implements combination forecasting to recover the disadvantages of the second method. The reason why we choose these models is that they have been widely used to forecast container throughput and proven effective [41], [31], [14], [37], [25], [8], [30]. Further in combination forecasting, it introduces GMDH neural network proposed by Ivakhnenko [20] to compensate for the deficiency of the third method and establish the selective combination forecasting. The factor screening function of GMDH neural network can objectively and automatically choose factors that critically influence the research object [49]. Thus, GMDH can reduce the effect of multi-collinearity on the performance of the model to some extent. Finally, the predictions of two parts are integrated to obtain the forecasting results of the original container throughput time series. The empirical analysis verifies the effectiveness of the HFMG model.
This study is organized as follows. Section 2 briefly reviews several commonly used time series forecasting models and the basic theory of the GMDH neural network. Section 3 describes the modeling idea and detailed modeling steps of the proposed HFMG model. Section 4 compares the performance of the HFMG model and other hybrid models in two actual container throughput series and gives further out-of-sample forecasting. Section 5 concludes this study.
Section snippets
SARIMA
The ARIMA model proposed by Box and Pierce [5] in the 1970s is a commonly used time series forecasting model. Considering the seasonal patterns in the time series, the ARIMA model is extended to the SARIMA model [4]. A SARIMA process can be denoted as SARIMA(p, d, q)(P, D, Q)s, which is expressed as follows.where {Yt} denotes the time series, L is the lag operator, s is the period of series, p and q are the autoregressive and moving average process orders,
Basic idea
The economic system time series, which has linear trend and nonlinear variation, is usually complex. Therefore, the linear trend can be predicted by a linear model, and the nonlinear characteristic hidden in the time series can be captured by some nonlinear forecasting models. Existing studies usually adopt a single nonlinear forecasting model, such as the SVR, GP, and BP neural network. However, these nonlinear models have their own advantages and disadvantages. Combining the forecasting
Data and evaluation criteria
This study utilizes the monthly container throughput data of the Xiamen and Shanghai Ports in China from January 2001 to December 2015 for the experiment, and takes the data from January 2001 to December 2014 as the training set and the data in 2015 as the test set. Data are collected from the official website of the Ministry of Transport of the People's Republic of China. Fig. 6 shows the original container throughput time series of the two ports. The two series have strong cyclicality and
Conclusion
The accurate forecasting of container throughput is important for the investment and construction of port infrastructure, as well as in the operation and management of ports. This study proposes a hybrid forecasting model HFMG, which decomposes the container throughput time series into linear and nonlinear parts. The linear part is predicted by the SARIMA model. The model adopts three nonlinear single models, SVR, BP, and GP, to predict the residual subseries considering the complexity of
Acknowledgements
This research is partly supported by the Natural Science Foundation of China [grant numbers 71471124 and 71501136]; Excellent Youth Fund of Sichuan University [grant numbers skqx201607, sksyl201709, and skzx2016-rcrw14]; MOE Youth Project of Humanities and Social Sciences [grant number 15YJC860034]; Natural Science Foundation of Anhui Higher Education Institutions [grant number KJ2016A604]; Youth Backbone Visiting Research Key Project at Abroad [grant number gxfxZD2016219]; and Key Project of
References (53)
- et al.
Application of back-propagation networks in debris flow prediction
Eng. Geol.
(2006) - et al.
Forecasting container throughputs at ports using genetic programming
Expert Syst. Appl.
(2010) - et al.
A modified regression model for forecasting the volumes of Taiwan's import containers
Math. Comput. Modell.
(2008) - et al.
A comparison of traditional and neural networks forecasting techniques for container throughput at Bangkok Port
Asian J. Ship. Logist.
(2011) - et al.
Another look at measures of forecast accuracy
Int. J. Forecast.
(2006) - et al.
A novel hybridization of artificial neural networks and ARIMA models for time series forecasting
Appl. Soft Comput.
(2011) - et al.
GMDH based back propagation algorithm to predict abutment scour in cohesive soils
Ocean Eng.
(2013) - et al.
A comparison of univariate methods for forecasting container throughput volumes
Math Comput. Modell.
(2009) - et al.
Multivariant forecasting mode of Guangdong Province port throughput with genetic algorithms and back propagation neural network
Procedia Soc. Behav. Sci.
(2013) - et al.
Hybrid approaches based on SARIMA and artificial neural networks for inspection time series forecasting
Transp. Res. E: Logist. Transp. Rev.
(2014)
Forecasting cargo growth and regional role of the Port of Hong Kong
Cities
Cluster ensemble framework based on the group method of data handling
Appl. Soft Comput.
Structure identification of Bayesian classifiers based on GMDH
Knowl. Based Syst.
A dynamic classifier ensemble selection approach for noise data
Inf. Sci.
Hybrid approaches based on LSSVR model for container throughput forecasting: a comparative study
Appl. Soft Comput.
Forecasting Australia's International container trade
A moving-average filter based hybrid ARIMA-ANN model for forecasting time series data
Appl. Soft Comput.
Seasonal unit roots in aggregate U.S. data
J. Econ.
Time Series Analysis: Forecasting and Control
Distribution of residual autocorrelations in autoregressive-integrated moving average time series models
J. Am. Stat. Assoc.
Application of grey-Markov model in predicting container throughput of Fujian Province
Adv. Mater. Res.
Port cargo throughput forecasting based on combination model
Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2016)
Support-vector networks
Mach. Learn.
Seasonal integration and cointegration
J. Econ.
Port throughput forecasting by using PPPR with chaotic efficient genetic algorithms and CMA
IEEE International Conference on Systems, Man, and Cybernetics
Forecasting inflation rate in Kenya using SARIMA model
Am. J. Theor. Appl. Stat.
Cited by (77)
A multi-variable hybrid system for port container throughput deterministic and uncertain forecasting
2024, Expert Systems with ApplicationsA machine learning approach and numerical investigation for intelligent forecasting of entropy generation rate inside a turbulator-inserted solar collector tube
2024, Engineering Analysis with Boundary ElementsNeural network-based prediction system for port throughput: A case study of Ningbo-Zhoushan Port
2023, Research in Transportation Business and ManagementForecasting container throughput of major Asian ports using the Prophet and hybrid time series models
2023, Asian Journal of Shipping and LogisticsMAG-D: A multivariate attention network based approach for cloud workload forecasting
2023, Future Generation Computer Systems