Abstract
Financial forecasting is an important area in computational finance. Evolutionary Dynamic Data Investment Evaluator (EDDIE) is an established genetic programming (GP) financial forecasting algorithm, which has successfully been applied to a number of international financial datasets. The purpose of this paper is to further improve the algorithm’s predictive performance, by incorporating heuristics in the search. We propose the use of two heuristics: a sequential covering strategy to iteratively build a solution in combination with the GP search and the use of an entropy-based dynamic discretisation procedure of numeric values. To examine the effectiveness of the proposed improvements, we test the new EDDIE version (EDDIE 9) across 20 datasets and compare its predictive performance against three previous EDDIE algorithms. In addition, we also compare our new algorithm’s performance against C4.5 and RIPPER, two state-of-the-art classification algorithms. Results show that the introduction of heuristics is very successful, allowing the algorithm to outperform all previous EDDIE versions and the well-known C4.5 and RIPPER algorithms. Results also show that the algorithm is able to return significantly high rates of return across the majority of the datasets.
Similar content being viewed by others
Notes
We use these indicators because they have been proved to be quite useful in developing GDTs in previous works like Martinez-Jaramillo (2007), Allen and Karjalainen (1999) and Austin et al. (2004). Of course, there is no reason why not use other information like fundamentals or limit order book. However, the aim of this work is not to find the ultimate indicators for financial forecasting.
These are the 6 indicators mentioned earlier; each indicator has two different period lengths, 12 and 50 days, thus resulting to a total of 12 technical indicators.
As we have mentioned, each GDT makes recommendations of buy (1) or not-to-buy (0). The former denotes a positive signal and the latter a negative. Thus, within the range of the training period, which is \(t\) days, a GDT will have returned a number of positive signals.
To make this clearer, let us give an example: if a given GP tree can have a maximum of \(k\) indicators, then the permutations of the available 12 indicators (we are using 6 different indicators, with 2 periods each, thus \(6*2=12\)) under EDDIE 7 are \(12^k\); on the other hand, if EDDIE 8 is using the same 6 indicators with periods within the range of 2 to 65 days, then the permutations of the available 384 indicators (we are using 6 different indicators with 65\(-\)1=64 periods each, thus \(64*6=384\)) are \(384^k\). It is thus obvious that EDDIE 8’s search space is significantly larger, which can therefore explain the difficulties of EDDIE 8 of consistently finding good solutions.
The datasets used in our experiments can be downloaded from: http://www.cs.kent.ac.uk/people/staff/mk451/datasets.html.
Refer to Sect. 5.2 for the definition of best tree.
References
Abdelmalek W, Hamida S, Abid F (2009) Selecting the best forecasting-implied volatility model using genetic programming. J Appl Math Decis Sci 2009:179230
Abdou H (2009) Genetic programming for credit scoring: the case of Egyptian public sector banks. Expert Syst Appl 36(9):11,402–11,417
Agapitos A, O’Neill M, Brabazon A (2010) Evolutionary learning of technical trading rules without data-mining bias. In: Schaefer R, Cotta C, Kołodziej J, Rudolph G (eds) Parallel problem solving from nature—PPSN XI, Springer, Lecture notes in computer science, vol 6238, pp 294–303
Allen F, Karjalainen R (1999) Using genetic algorithms to find technical trading rules. J Financ Econ 51:245–271
Austin M, Bates G, Dempster M, Leemans V, Williams S (2004) Adaptive systems for foreign exchange trading. Quant Financ 4(4):37–45
Backus J (1959) The syntax and semantics of the proposed international algebraic language of Zurich. In: International conference on information processing, UNESCO, pp 125–132
Binner J, Kendall G, Chen SH (eds) (2004) Applications of artificial intelligence in finance and economics. Advances in econometrics, vol 19. Elsevier
Brookhouse J, Otero FEB, Kampouridis M (2014) Working with OpenCL to speed up a genetic programming financial forecasting algorithm: initial results. In: Wagner S, Affeneller M (eds) GECCO 2014 workshop on evolutionary computation software systems (EvoSoft), pp 1117–1124
Chen SH (2002) Genetic algorithms and genetic programming in computational financ. Springer-Verlag, New York LLC
Cohen W (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning, Morgan Kaufmann, pp 115–123
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Edwards R, Magee J (1992) Technical analysis of stock trends. New York Institute of Finance, New York
Fayyad U, Piatetsky-Shapiro G, Smith P (1996) From data mining to knowledge discovery: an overview. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT Press, pp 1–34
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
Giacobini M, Provero P, Vanneschi L, Mauri G (2014) Towards the use of genetic programming for the prediction of survival in cancer. In: Cagnoni S, Mirolli M, Villani M (eds) Evolution, complexity and artificial life. Springer, Berlin, pp 177–192
Hu Y (1998) Constructive induction: covering attribute spectrum. Feature extraction construction and selection. Kluwer Academic Publishers, pp 257–272
Kampouridis M, Otero FEB (2013) Using attribute construction to improve the predictability of a GP financial forecasting algorithm. In: Proceedings of the conference on technologies and applications of artificial intelligence, IEEE Xplore, pp 55–60
Kampouridis M, Tsang E (2010) EDDIE for investment opportunities forecasting: extending the search space of the GP. In: Proceedings of the IEEE world congress on computational intelligence, Barcelona, Spain, pp 2019–2026
Kampouridis M, Tsang E (2012) Investment opportunities forecasting: extending the grammar of a gp-based tool. Int J Comput Intell Syst 5(3):530–541
Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Krawiec K (2002) Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet Program Evol Mach 3(4):329–343
Li J (2001) FGP: a genetic programming-ased financial forecasting tool. PhD thesis, Department of Computer Science, University of Essex
Martinez-Jaramillo S (2007) Artificial financial markets: an agent-based approach to reproduce stylized facts and to study the red queen effect. PhD thesis, CFFEA, University of Essex
Otero FEB, Silva M, Freitas A, Nievola J (2003) Genetic programming for attribute construction in data mining. In: Proceedings of EuroGP, LNCS 2610, pp 384–393
Otero FEB, Freitas A, Johnson C (2008) cAnt-Miner: an ant colony classification algorithm to cope with continuous attributes. In: Ant colony optimization and swarm intelligence (Proceedings of ANTS 2008), pp 48–59
Otero FEB, Freitas A, Johnson C (2013) A new sequential covering strategy for inducing classification rules with ant colony algorithms. IEEE Trans Evol Comput 17(1):64–76
Otero FEB, Johnson CG (2013) Automated problem decomposition for the boolean domain with genetic programming. Proceedings of the 16th European conference on genetic programming, EuroGP 2013, Austria, Vienna, pp 169–180
Phua C, Lee V, Smith K, Gayler R (2010) A comprehensive survey of data mining-based Fraud detection research. http://www.bsys.monash.edu.au/people/cphua/
Piatetsky-Shapiro G, Frawley W (1991) Knowledge discovery in databases. AAAI Press, Menlo Park, California
Poli R, Langdon W, McPhee N (2008) A field guide to genetic programming. Lulu.com
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco
Dos Santos J, Ferreira C, Da S, Torres R, Gonçalves M, Lamparelli R (2011) A relevance feedback method based on genetic programming for classification of remote sensing images. Inf Sci 181(13):2671–2684
Tsang E, Martinez-Jaramillo S (2004) Computational finance. IEEE Comput Intell Soc Newsl 3–8
Tsang E, Li J, Markose S, Er H, Salhi A, Iori G (2000) EDDIE in financial decision making. J Manag Econ 4(4) (online)
Tsang E, Markose S, Er H (2005) Chance discovery in stock index option and future arbitrage. New Math Nat Comput World Sci 1(3):435–447
Wang P, Tsang E, Weise T, Tang K, Yao X (2010) Using GP to evolve decision rules for classification in financial data sets. In: Cognitive informatics (ICCI), 2010 9th IEEE international conference on, pp 720–727
Wilson G, Banzhaf W (2010) Fast and effective predictability filters for stock price series using linear genetic programming. In: Evolutionary computation (CEC), 2010 IEEE congress on, pp 1–8, doi:10.1109/CEC.2010.5586297
Witten H, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, California
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by C.-S. Lee.
Rights and permissions
About this article
Cite this article
Kampouridis, M., Otero, F.E.B. Heuristic procedures for improving the predictability of a genetic programming financial forecasting algorithm. Soft Comput 21, 295–310 (2017). https://doi.org/10.1007/s00500-015-1614-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1614-8