Forecasting performance of regional innovation systems using semantic-based genetic programming with local search optimizer

https://doi.org/10.1016/j.cor.2018.02.001Get rights and content

Abstract

Innovation performance of regional innovation systems can serve as an important tool for policymaking to identify best practices and provide aid to regions in need. Accurate forecasting of regional innovation performance plays a critical role in the implementation of policies intended to support innovation because it can be used to simulate the effects of actions and strategies. However, innovation is a complex and dynamic socio-economic phenomenon. Moreover, patterns in regional innovation structures are becoming increasingly diverse and non-linear. Therefore, to develop an accurate forecasting tool for this problem represents a challenge for optimization methods. The main aim of the paper is to develop a model based on a variant of genetic programming to address the regional innovation performance forecasting problem. Using the historical data related to regional knowledge base and competitiveness, the model should accurately and effectively predict a variety of innovation outputs, including patent counts, technological and non-technological innovation activity and economic effects of innovations. We show that the proposed model outperforms state-of-the-art machine learning methods.

Introduction

Innovation is considered as a complex and dynamic, socio-technical, socio-economic, socio-political phenomenon that has been recognised as a central issue in economic development (Carayannis et al., 2016) The concept of regional innovation systems has recently received increased attention (Hajek et al., 2014, Lau and Lo, 2015) mainly due to the growing importance of regions (and other sub-national entities) in a globalised economy (Freeman, 2002). In regional innovation systems, private and public actors intensively interact and thus promote the generation, use, and dissemination of knowledge (Tödtling and Trippl, 2005). In addition, regions are critical entities for innovation policymaking because regions provide favourable conditions for knowledge creation and transfer (Lau and Lo, 2015). In this context, measuring the innovation performance of regions has become a priority in order to develop integrated benchmarking systems in the knowledge-based economies (Carayannis et al., 2016). This enables policymakers not only to comparatively evaluate the performance of regional innovation systems but also to identify best practices (innovation leaders) and target the regions in need (lagging behind regions). This is also why regional innovation performance is annually measured in many countries, for example using the innovation scoreboard for EU regions (Hollanders et al., 2016).

The advantages of using the indicators of innovation performance at regional level can be summarized as follows Evangelista et al., 2001): ((1) analysts and statisticians have strong experience with the collection and use of such indicators; (2) these indicators comprehensively cover all countries, industries and technological fields; (3) long time-series data are available to study the dynamics in the innovation performance of firms and industries across regions. In addition, the input-output innovation relationship is considered to be more robust at the regional level compared with the firm level (Audretsch and Feldman, 2004). This is attributed to both the important role of the regional context and the existence of externalities. Indeed, the results of firm-level models may provide incorrect inference in the presence of a strong effect of the regional context on the generation of innovations (Naz et al., 2015). However, note that the results obtained at the regional level cannot be interpreted at the level of individual firms as these results might significantly differ from those obtained from firm-level data due to biased estimates (Naz et al., 2015).

The main concern in measuring regional innovation performance is the complexity and dynamic changes in regional innovation systems Hajek et al., 2014). As a result, the data for the evaluation quickly become obsolete. Therefore, an accurate and reliable forecasting tool to support decision making presents a challenging task for optimization methods. Non-linear machine learning methods such as fuzzy rule-based systems and neural networks have been used for innovation forecasting at the firm level (Wang and Chien, 2006, Chien et al., 2010). These methods outperformed traditional statistical forecasting models in terms of accuracy, indicating non-linear patterns in firm innovation activities. In addition, recent empirical evidence provides support for this assumption also at both the regional (Hajek and Henriques, 2017) and national level (de la Paz-Marín et al., 2012). Moreover, chaos theory was used to detect non-linearity and strange attractors in the evolutionary path of patent counts (Hung and Tu, 2014). Regarding innovation systems, Samara et al. (2012) developed an integrated system dynamics approach to analyse the impact of innovation policies on the performance of national innovation systems. However, no previous research known to us has forecasted the performance of regional innovation systems using artificial intelligence methods. The main advantage of these methods, compared with traditional statistical forecasting methods, is that no complex mathematical formulation of the input-output relationships is necessary. Moreover, traditional methods are not suitable for modelling phenomena characterised by a high variance (Castelli et al., 2016). In the case of regional innovation systems, the high variance is mainly due to the highly dynamic socio-economic environment. To address these issues, we develop a forecasting model based on genetic programming in this study. Specifically, we use a recently proposed and very promising variant of standard genetic programming that integrates the concept of semantic awareness and local search optimizers to generate forecasting models. We argue that this model is more appropriate to model intrinsic non-linear character of innovation performance than traditional statistical and machine learning forecasting models because genetic programming: (1) has an excellent evolvability on training data (Vanneschi, 2017) and (2) is able to generalize the solution also on testing data (Castelli et al., 2015b). Our approach combines two recent advancements in genetic programming, this is (1) geometric semantic operator (GSO) that eliminates local optima by inducing a unimodal error surface on any forecasting algorithm and (2) local search optimizer (LSO) to make the convergence faster. The main idea of combining these approaches is to achieve a balance between exploration (GSO) and exploitation (LSO). As a result, the forecasting model can be optimized faster and overfitting can be avoided.

To verify the appropriateness of the proposed model for forecasting performance of regional innovation systems, the data on European regions for the period 2004–2012 were used. The inputs of the forecasting model are represented by indicators related to the regional knowledge base (regional knowledge generation, absorption, and transfer capacity) and regional competitiveness indexes approximating regional socio-technical, socio-economic and socio-political environment. The outputs include four indicators of the performance of regional innovation systems, namely patent counts, technological and non-technological innovation activity and economic effects of innovations. The models are first trained to forecast innovation performance for 2010, and then the models are tested on 2012 data. We demonstrate that the proposed model outperforms other statistical and artificial intelligence methods in terms of accuracy on testing data.

The remainder of this paper is structured as follows. In the next section, we present the inputs and outputs of the model and describe the data used. The variant of genetic programming proposed in this study to the forecasting problem is introduced in the following section. Section 4 describes the setting of the forecasting model and provides the experimental results comparing the proposed approach to other variants of genetic programming algorithm and other state-of-the-art forecasting methods. Finally, we conclude the paper, highlighting the main contributions of this study.

Section snippets

Data

Four interacting categories of determinants have been introduced into the models of regional innovation systems, namely regional competitiveness, knowledge generation, knowledge absorption and knowledge transfer (Hajek et al., 2014, Lau and Lo, 2015, Tödtling and Trippl, 2005, Samara et al., 2012). Table 1 presents the determinants of regional innovation performance used in this study.

Different socio-economic conditions and regional competitiveness have been reported as an important determinant

An introduction to genetic programming

Genetic Programming (GP) (Koza, 1992) is a computational method that belongs to the computational intelligence research area called evolutionary computation (Eiben et al., 2003). GP consists of the automated learning of computer programs by means of a process inspired by the theory of biological evolution of Darwin. In the context of GP, the word program can be interpreted in general terms, and thus GP can be applied to the particular cases of learning expressions, functions and, as in this

Geometric semantic genetic programming

Even though the term semantics can have several different interpretations, it is a common trend in the GP community (and this is what we do also here) to define the semantics of a solution as the vector s(T) = [T(x1), T(x2), …, T(xn)] of its output values. From this perspective, a GP individual can be identified by a point (its semantics s(T)) in a multidimensional space that we call semantic space (where the number of dimensions is equal to the number of observations in the training set (or

Local search in GP and GSGP

In Section 5.1, we discuss previous approaches for integrating Local Search (LS) with standard GP. Afterwards, in Section 5.2, we present the first integration of a local searcher within GSGP.

Experiments

This section describes the data pre-processing, experimental settings and the obtained results.

Conclusion and future directions

This study argued that the proposed GP-based model is more appropriate to model intrinsic complex and non-linear character of regional innovation performance than traditional statistical and machine learning forecasting models. The results of this study indicate that the GP-based model significantly outperforms other forecasting models in terms of test error. These results also suggest that the proposed forecasting model not only provides a good solution on training data but it also avoids

Acknowledgements

We gratefully acknowledge the help provided by constructive comments of the anonymous referees. This work was supported by the scientific research project of the Czech Sciences Foundation Grant no: 17-11795S.

References (60)

  • P. Hajek et al.

    Visualising components of regional innovation systems using self-organizing maps—evidence from European regions

    Technol. Forecast Soc. Change

    (2014)
  • V. Hájková et al.

    Efficiency of knowledge bases in urban population and economic growth - evidence from European cities

    Cities

    (2014)
  • S.C. Hung et al.

    Is small actually big? The chaos of technological change

    Res. Policy

    (2014)
  • A.K.W. Lau et al.

    Regional innovation system, absorptive capacity and innovation performance: an empirical study

    Technol. Forecast Soc. Change

    (2015)
  • E. Samara et al.

    The impact of innovation policies on the performance of national innovation systems: a system dynamics analysis

    Technovation

    (2012)
  • F Tödtling et al.

    One size fits all?

    Res. Policy

    (2005)
  • G. Tripepi et al.

    Linear and logistic regression analysis

    Kidney Int.

    (2008)
  • T.Y. Wang et al.

    Forecasting innovation performance via neural networks - a case of Taiwanese manufacturing industry

    Technovation

    (2006)
  • (2002)
  • P. Annoni et al.

    EU Regional Competitiveness Index

    (2017)
  • B.T. Asheim et al.

    Constructing regional advantage: platform policies based on related variety and differentiated knowledge bases

    Reg. Stud.

    (2011)
  • D. Basak et al.

    Support vector regression

    Neuronal Inf. Process. Lett. Rev.

    (2007)
  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • L. Breiman

    Stacked regressions

    Mach. Learn.

    (1996)
  • T. Brenner et al.

    Methodological issues in measuring innovation performance of spatial units

    Ind. Innov.

    (2011)
  • M. Castelli et al.

    A C++ framework for geometric semantic genetic programming

    Genet. Program. Evolvable Mach.

    (2015)
  • M. Castelli et al.

    Geometric semantic genetic programming with local search

  • H. Chen et al.

    A patent time series processing component for technology intelligence by trend identification functionality

    Neural Comput. Appl.

    (2015)
  • X. Chen et al.

    A multi-facet survey on memetic computation

    IEEE Trans. Evol. Comput.

    (2011)
  • P. Cooke

    Knowledge economies

    Clusters, Learning and Cooperative Advantage

    (2002)
  • Cited by (20)

    • A novel binary classification approach based on geometric semantic genetic programming

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      Here the UPDRS score was predicted using 18 features. More recently, the GSGP has been enhanced by using local search operators [27,28]. In [27], the authors applied the enhanced GSGP approach to two problems in the biomedical field: computerized tomography (CT) scan and 3D Protein Structure.

    • Configuration Paths to Efficient National Innovation Ecosystems

      2021, Technological Forecasting and Social Change
      Citation Excerpt :

      The proposed DEA-fsQCA model not only explores the contrarian cases but also enabled us to reveal different insightful combinations of conditions leading to high NIE efficiency. In addition, asymmetrical and non-linear dependencies can be found using the proposed model, allowing us to model intrinsic non-linear characteristics of innovation systems (Hajek et al., 2019). In what follows, we discuss the obtained results with respect to countries representing the paths to NIE efficiencies.

    • Typology of Firms by Innovation Performance: A Cluster Analysis of a Regional Innovation System

      2024, Developments in Marketing Science: Proceedings of the Academy of Marketing Science
    View all citing articles on Scopus
    View full text