Elsevier

Applied Soft Computing

Volume 62, January 2018, Pages 478-490
Applied Soft Computing

GMDH-based hybrid model for container throughput forecasting: Selective combination forecasting in nonlinear subseries

https://doi.org/10.1016/j.asoc.2017.10.033Get rights and content

Highlights

  • We propose a GMDH-based hybrid model to forecast the container throughput.

  • It first divides the original time series into linear trend and nonlinear residual.

  • It predicts the linear trend by SARIMA method.

  • It forecasts the nonlinear residual by GMDH-based selective combination forecasting.

  • The empirical analysis results show that the model has better performance.

Abstract

The accurate forecasting of future container throughput is important for the construction, upgrade, and operation management of a port. This study introduces group method of data handling (GMDH) neural network and proposes a hybrid forecasting model based on GMDH (HFMG) to forecast container throughput. This model decomposes the original container throughput series into two parts: linear trend and nonlinear variation, and uses the seasonal autoregressive integrated moving average (SARIMA) approach to predict the linear trend. Considering the complexity of forecasting nonlinear subseries, the proposed model adopts three nonlinear single models, namely, support vector regression (SVR), back-propagation (BP) neural network, and genetic programming (GP), to predict the nonlinear subseries. Then, the model establishes selective combination forecasting by the GMDH neural network on the nonlinear subseries and obtains its combination forecasting results. Finally, the predictions of two parts are integrated to obtain the forecasting results of the original container throughput time series. The container throughput data of Xiamen and Shanghai Ports in China are used for empirical analysis, and the results show that the forecasting performance of the HFMG model is better than that of SARIMA model, as well as some hybrid forecasting models, such as SARIMA-SVR, SARIMA-GP, and SARIMA-BP. Finally, the monthly out-of-sample forecasts of container throughput for the two ports throughout 2016 are given.

Introduction

With the deepening of economic globalization and the frequent exchanges of international trade, container transportation has played an important role in reducing transportation time and trade cost. Therefore, major ports in the world have striven to develop containerization. The World Bank statistics has shown that the global container throughput in 2014 reached 679.2647 million twenty-foot equivalent units (TEU), which has increased by 80% since 2005. With the upsurge of container transport development, the booming construction of ports resulted in a number of problems, such as overcapacity and declined throughput capacity utilization [29]. Container port construction requires a long period and large investment funds. Once over-construction occurs, some issues, such as the excess of capacity and failure to gain the expected profit, will lead to a huge waste of time and capital [51]. Therefore, it is very important to predict the future container throughput for adjusting the port development direction, making port operation schedules, planning the port scale, and reducing resource waste [9].

The research on container throughput forecasting began in the 1980s and has considerably developed. Existing forecasting methods mainly include single models and hybrid approaches. The single models use only one model for prediction and can be divided into three types [50].

(1) Time series models include exponential smoothing (ES), autoregressive integrated moving average (ARIMA) [1], [38], seasonal autoregressive integrated moving average (SARIMA) [41], [31], [14], vector autoregressive (VAR) [45], decomposition approach [35], and grey forecasting (GM) [7], [16], [28]. For example, Schulz and Prinz [41] forecasted the quarterly container throughput series in Germany using the SARIMA and Holt–Winters ES models, in which the former performed slightly better than the latter. Syafi’i et al. [45] proposed a vector error correction model to forecast container throughput based on the VAR model and verified good forecasting performance. Guo, et al. [16] proposed an improved gray Verhulst model and overcame the deficiency of the increasing forecasting error of the GM (1,1) model as the container throughput grows in the S-curve.

(2) Causal analysis models [42], [10], [34] include regression and elasticity coefficient analysis. For instance, Chou, et al. [10] proposed an improved regression model to predict the container throughput of Taiwan Port; the model performed better than the traditional regression model. Patil and Sahu [34] used a regression model for predicting the freight flow at Mumbai Port and implemented an elasticity analysis to identify the impact factors of freight flow.

(3) Nonlinear dynamic forecasting models include artificial neural network [15], [26], [43], [37], [25], genetic programming (GP) [8], and support vector regression (SVR) [30]. For example, Gosasang, et al. [15] achieved better results in predicting the container throughput of Bangkok Port by using a multilayer perceptron neural network than a linear regression model. Chen and Chen [8] adopted the GP, X-11, and SARIMA models to predict the container throughput of major ports in Taiwan, and the comparison results showed that the GP model performed best. Mak and Yang [30] proposed an approximate least squares support vector machine (ALSSVM) model to predict the container throughput of the Hong Kong Port and compared with support vector machine (SVM), least squares SVM, and radial basis function neural network, which showed that the ALSSVM model was the best overall.

The container throughput time series is usually complex; thus, a single model based on linear assumptions or a nonlinear dynamic model often cannot obtain satisfactory forecasting performance [39], [21], [2]. An increasing number of researchers have constructed hybrid forecasting models [17], [13], [51], [52], [9], [53] to solve this problem. For example, Xie et al. [52] proposed three hybrid models based on the least squares support vector regression (LSSVR). The empirical analysis based on the container throughput time series of Shanghai and Shenzhen Ports showed that the proposed hybrid models performed better than the single models.

The above studies have paid significant attention to hybrid models because their prediction performance is usually better than that of the single models. An efficient hybrid forecasting model may be constructed in three ways: (1) The embedded method embeds one model into another (i.e., optimizes the parameters of one model with another, such as the References [17], [13]); (2) The divide and rule method, which first decomposes the original time series into several subseries, and then models and predicts each subseries by an appropriate model, finally integrates the prediction results according to certain rules, such as the References [51], [52]. This method is popular. However, given that some of the subseries decomposed from the original time series are often highly nonlinear, accurately forecasting for those subseries using a single nonlinear model is difficult, and a large prediction error on some subseries leads to a poor prediction performance in the entire time series. (3) Combination forecasting method adopts several models to predict the original time series and assigns a weight to the forecasting results of each model in order to obtain the final combined results, such as Refs. [9], [53]. This method is convenient and simple; however, it usually allows the combination of all trained models, and multi-collinearity among the models may exist, which will degrade the forecasting accuracy of the model. Forecasting performance can be improved by selecting and combining the forecasting results of a subset of the models for a final decision, but the manner in which an optimal subset is derived still remains a challenge.

This study combines the latter two hybrid forecasting methods above and proposes a hybrid forecasting model based on group method of data handling (GMDH) [20], called HFMG. This model first decomposes the original time series into two parts: linear trend and nonlinear variation. The linear trend is the increase of the container throughput caused by long-term economic growth and predicted by the SARIMA model. The nonlinear variation mainly comes from the irregular changes and shocks of various factors in the economic system. Considering the complexity of nonlinear variation, the model trains three nonlinear single models, namely, SVR, BP neural network, and GP, and implements combination forecasting to recover the disadvantages of the second method. The reason why we choose these models is that they have been widely used to forecast container throughput and proven effective [41], [31], [14], [37], [25], [8], [30]. Further in combination forecasting, it introduces GMDH neural network proposed by Ivakhnenko [20] to compensate for the deficiency of the third method and establish the selective combination forecasting. The factor screening function of GMDH neural network can objectively and automatically choose factors that critically influence the research object [49]. Thus, GMDH can reduce the effect of multi-collinearity on the performance of the model to some extent. Finally, the predictions of two parts are integrated to obtain the forecasting results of the original container throughput time series. The empirical analysis verifies the effectiveness of the HFMG model.

This study is organized as follows. Section 2 briefly reviews several commonly used time series forecasting models and the basic theory of the GMDH neural network. Section 3 describes the modeling idea and detailed modeling steps of the proposed HFMG model. Section 4 compares the performance of the HFMG model and other hybrid models in two actual container throughput series and gives further out-of-sample forecasting. Section 5 concludes this study.

Section snippets

SARIMA

The ARIMA model proposed by Box and Pierce [5] in the 1970s is a commonly used time series forecasting model. Considering the seasonal patterns in the time series, the ARIMA model is extended to the SARIMA model [4]. A SARIMA process can be denoted as SARIMA(p, d, q)(P, D, Q)s, which is expressed as follows.Φ(L)Φ(Ls)(1L)d(1Ls)DYt=Θ(L)Θ(Ls)εt,where {Yt} denotes the time series, L is the lag operator, s is the period of series, p and q are the autoregressive and moving average process orders,

Basic idea

The economic system time series, which has linear trend and nonlinear variation, is usually complex. Therefore, the linear trend can be predicted by a linear model, and the nonlinear characteristic hidden in the time series can be captured by some nonlinear forecasting models. Existing studies usually adopt a single nonlinear forecasting model, such as the SVR, GP, and BP neural network. However, these nonlinear models have their own advantages and disadvantages. Combining the forecasting

Data and evaluation criteria

This study utilizes the monthly container throughput data of the Xiamen and Shanghai Ports in China from January 2001 to December 2015 for the experiment, and takes the data from January 2001 to December 2014 as the training set and the data in 2015 as the test set. Data are collected from the official website of the Ministry of Transport of the People's Republic of China. Fig. 6 shows the original container throughput time series of the two ports. The two series have strong cyclicality and

Conclusion

The accurate forecasting of container throughput is important for the investment and construction of port infrastructure, as well as in the operation and management of ports. This study proposes a hybrid forecasting model HFMG, which decomposes the container throughput time series into linear and nonlinear parts. The linear part is predicted by the SARIMA model. The model adopts three nonlinear single models, SVR, BP, and GP, to predict the residual subseries considering the complexity of

Acknowledgements

This research is partly sup­ported by the Natural Science Foundation of China [grant numbers 71471124 and 71501136]; Excellent Youth Fund of Sichuan University [grant numbers skqx201607, sksyl201709, and skzx2016-rcrw14]; MOE Youth Project of Humanities and Social Sciences [grant number 15YJC860034]; Natural Science Foundation of Anhui Higher Education Institutions [grant number KJ2016A604]; Youth Backbone Visiting Research Key Project at Abroad [grant number gxfxZD2016219]; and Key Project of

References (53)

  • W. Seabrooke et al.

    Forecasting cargo growth and regional role of the Port of Hong Kong

    Cities

    (2003)
  • G. Teng et al.

    Cluster ensemble framework based on the group method of data handling

    Appl. Soft Comput.

    (2016)
  • J. Xiao et al.

    Structure identification of Bayesian classifiers based on GMDH

    Knowl. Based Syst.

    (2009)
  • J. Xiao et al.

    A dynamic classifier ensemble selection approach for noise data

    Inf. Sci.

    (2010)
  • G. Xie et al.

    Hybrid approaches based on LSSVR model for container throughput forecasting: a comparative study

    Appl. Soft Comput.

    (2013)
  • A. Johnson

    Forecasting Australia's International container trade

  • C. Narendra Babu et al.

    A moving-average filter based hybrid ARIMA-ANN model for forecasting time series data

    Appl. Soft Comput.

    (2014)
  • J. Joseph Beaulieu et al.

    Seasonal unit roots in aggregate U.S. data

    J. Econ.

    (1992)
  • G.E.P. Box et al.

    Time Series Analysis: Forecasting and Control

    (2015)
  • G.E.P. Box et al.

    Distribution of residual autocorrelations in autoregressive-integrated moving average time series models

    J. Am. Stat. Assoc.

    (1970)
  • C.P. Chen et al.

    Application of grey-Markov model in predicting container throughput of Fujian Province

    Adv. Mater. Res.

    (2013)
  • Z. Chen et al.

    Port cargo throughput forecasting based on combination model

    Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2016)

    (2016)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • S. Hyllebergr et al.

    Seasonal integration and cointegration

    J. Econ.

    (1990)
  • J. Geng et al.

    Port throughput forecasting by using PPPR with chaotic efficient genetic algorithms and CMA

    IEEE International Conference on Systems, Man, and Cybernetics

    (2015)
  • S.W. Gikungu et al.

    Forecasting inflation rate in Kenya using SARIMA model

    Am. J. Theor. Appl. Stat.

    (2015)
  • Cited by (77)

    View all citing articles on Scopus
    View full text