Abstract
In this paper a data mining approach for variable selection and knowledge extraction from datasets is presented. The approach is based on unguided symbolic regression (every variable present in the dataset is treated as the target variable in multiple regression runs) and a novel variable relevance metric for genetic programming. The relevance of each input variable is calculated and a model approximating the target variable is created. The genetic programming configurations with different target variables are executed multiple times to reduce stochastic effects and the aggregated results are displayed as a variable interaction network. This interaction network highlights important system components and implicit relations between the variables. The whole approach is tested on a blast furnace dataset, because of the complexity of the blast furnace and the many interrelations between the variables. Finally the achieved results are discussed with respect to existing knowledge about the blast furnace process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference, and Prediction, 2nd edn. Springer, Heidelberg (2009)
Keijzer, M.: Scaled symbolic regression. Genetic Programming and Evolvable Machines 5(3), 259–269 (2004)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Luke, S.: Two fast tree-creation algorithms for genetic programming. IEEE Transactions on Evolutionary Computation 4(3), 274–283 (2000)
Schmöle, P., Lüngen, H.B.: Einsatz von vorreduzierten Stoffen im Hochofen: metallurgische, ökologische und wirtschaftliche Aspekte. Stahl und Eisen 4(127), 47–56 (2007)
Smits, G., Kordon, A., Vladislavleva, K., Jordaan, E., Kotanchek, M.: Variable Selection in Industrial Datasets Using Pareto Genetic Programming, Genetic Programming, vol. 9, pp. 79–92. Springer, US (2006)
Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.M., Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming in Theory and Practice II, pp. 283–299. Springer, Heidelberg (2005)
Strassburger, J.H., Brown, D.C., Dancy, T.E., Stephenson, R.L. (eds.): Blast furnace - theory and practice. Gordon and Breach Science Publishers, New York (1969), second Printing (August 1984)
Vladislavleva, K., Veeramachaneni, K., Burland, M., Parcon, J., O’Reilly, U.M.: Knowledge mining with genetic programming methods for variable selection in flavor design. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2010), pp. 941–948 (2010)
Wagner, S.: Heuristic Optimization Software Systems - Modeling of Heuristic Optimization Algorithms in the HeuristicLab Software Environment. Ph.D. thesis, Institute for Formal Models and Verification, Johannes Kepler University, Linz, Austria (2009)
Winkler, S.M.: Evolutionary System Identification - Modern Concepts and Practical Applications. No. 59 in Reihe C - Technik und Naturwissenschaften, Trauner Verlag, Linz (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kommenda, M., Kronberger, G., Feilmayr, C., Affenzeller, M. (2011). Data Mining Using Unguided Symbolic Regression on a Blast Furnace Dataset. In: Di Chio, C., et al. Applications of Evolutionary Computation. EvoApplications 2011. Lecture Notes in Computer Science, vol 6624. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20525-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-20525-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20524-8
Online ISBN: 978-3-642-20525-5
eBook Packages: Computer ScienceComputer Science (R0)