Abstract
In this chapter we illustrate a framework based on symbolic regression to generate and sharpen the questions about the nature of the underlying system and provide additional context and understanding based on multi-variate numeric data.
We emphasize the necessity to perform data modeling in a global approach, iteratively applying data analysis and adaptation, model building, and problem reduction procedures. We illustrate it for the problem of detecting outliers and extracting significant features from the CountryData1-a data set of economic, political, social and geographic data collected. We present two complementary ways of extracting outliers from the data-the content-based and the model-based approach. The content-based approach studies the geometrical structure of the multi-variate data, and uses data-balancing algorithms to sort the data records in the order of decreasing typicalness, and identify the outliers as the least typical records before the modeling is applied to a data set. The model-based outlier detection approach uses symbolic regression via Pareto genetic programming (GP) to identify records which are systematically under-or over-predicted by diverse ensembles of (thousands of) global non-linear symbolic regression models.
Both approaches applied to the CountryData produce insights into outlier vs. prototypes division amongworld countries and about driving economic properties predicting gross domestic product (GDP) per capita.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, Charu C., Hinneburg, Alexander, and Keim, Daniel A. (2001). On the surprising behavior of distance metrics in high dimensional space. Lecture Notes in Computer Science, 1973:420–434.
Francois, Damien, Wertz, Vincent, and Verleysen, Michel (2007). The concentration of fractional distances. IEEE Trans. on Knowledge and Data Engineering, 19(7):873–886.
Harmeling, Stefan, Dornhege, Guido, Tax, David, Meinecke, Frank, and Muller, Klaus-Robert (2006). From outliers to prototypes: Ordering data. Neurocomputing, 69(13–15):1608–1618.
Kotanchek, Mark, Smits, Guido, and Vladislavleva, Ekaterina (2006). Pursuing the pareto paradigm tournaments, algorithm variations & ordinal optimization. In Riolo, Rick L., Soule, Terence, and Worzel, Bill, editors, Genetic Programming Theory and Practice IV, volume 5 of Genetic and Evolutionary Computation, chapter 12, pages 167–186. Springer, Ann Arbor.
Kotanchek, Mark, Smits, Guido, and Vladislavleva, Ekaterina (2007). Trustable symoblic regression models. In Riolo, Rick L., Soule, Terence, and Worzel, Bill, editors, Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation, chapter 12, pages 203–222. Springer, Ann Arbor.
Smits, Guido, Kordon, Arthur, Vladislavleva, Katherine, Jordaan, Elsa, and Kotanchek, Mark (2005). Variable selection in industrial datasets using pareto genetic programming. In Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice III, volume 9 of Genetic Programming, chapter 6, pages 79–92. Springer, Ann Arbor.
Smits, Guido and Kotanchek, Mark (2004). Pareto-front exploitation in symbolic regression. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 17, pages 283–299. Springer, Ann Arbor.
Vladislavleva, Ekaterina (2008). Model-based Problem Solving through Symbolic Regression via Pareto Genetic Programming. PhD thesis, Tilburg University, Tilburg, the Netherlands.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Kotanchek, M.E., Vladislavleva, E.Y., Smits, G.F. (2010). Symbolic Regression Via Genetic Programming as a Discovery Engine: Insights on Outliers and Prototypes. In: Riolo, R., O'Reilly, UM., McConaghy, T. (eds) Genetic Programming Theory and Practice VII. Genetic and Evolutionary Computation. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1626-6_4
Download citation
DOI: https://doi.org/10.1007/978-1-4419-1626-6_4
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-1653-2
Online ISBN: 978-1-4419-1626-6
eBook Packages: Computer ScienceComputer Science (R0)