Abstract
In practical data mining and process monitoring problems high-dimensional data has to be analyzed. In most of the cases it is very informative to map and visualize the hidden structure of complex data in a low-dimensional space. Industrial applications require easily implementable, interpretable and accurate projection. Nonlinear functions (aggregates) are useful for this purpose. A pair of these functions realise feature selection and transformation but finding the proper model structure is a complex nonlinear optimisation problem. We present a Genetic Programming (GP) based algorithm to generate aggregates represented in a tree structure. Results show that the developed tool can be effectively used to build an on-line spectroscopy based process monitoring system; the two-dimensional mapping of high dimensional spectral database can represent different operating ranges of the process.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 192, 153–158 (1997)
Narendra, P., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput., C-269, 917–922 (1977)
Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(1), 1119–1125 (1994)
Madr, J., Abonyi, J., Szeifert, F.: Genetic programming for the identification of nonlinear input-output models. Ind. Eng. Chem. Res., 44(9), 3178–3186 (2005)
Jolliffe, T.: Principal Component Analysis. Springer, New York (1996)
Comon, P.: Independent component analysis: a new concept? Sig. Process. 36(3), 287–317 (1994)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals Eugenics 7, 179–188 (1936)
Kohonen, T.: Self-Organizing Maps. Springer, Berlin (2001)
Sammon, J.W.: A non-linear mapping for data structure analysis. IEEE Trans. Comput. 18(5), 401–409 (1969)
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Sonbul, Y.R.: Topological near infrared analysis modeling of petroleum refinery products (2005). US6.897.071 B2
Yang, J., Lee, I.: Common Clustering Algorithms. Comprehensive Chemometrics. Elsevier, Amsterdam, pp 577–618, (2009)
Erdil, E., Mimaroglu, S.: Combining multiple clusterings using similarity graph. Pattern Recogn. 44(3), 694–703 (2011)
Chemaly, T.P., Aldrich, C.: Visualization of process data by use of evolutionary computation. Comput. Chem. Eng. 25(9–10), 1341–1349 (2001)
Venna, J., Kaski, S.: Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity. In: Proceedings of the Workshop on Self-organizing Maps, pp 695–702
Venna, J., Kaski, S.: Local multidimensional scaling. Neural Netw., 19(6), 889–899 (2006)
Kaski, S., Nikkilä, J., Oja, M., Venna, J., Törönen, J., Castrén, E.: Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinform., 4(1), 48, (2003)
Descales, B., Lambert, D., Llinas, J.R., Martens, A., Osta, S., Sanchez, M., Bages, S.: Method for determining properties using near infra-red (nir), spectroscopy (2000). US6.070.128
Govindaraju, V., Wu, Y., Ianakiev, K.: Improved k-nearest neighbor classification. Pattern Recogn. 35(1), 2311–2318 (2002)
Acknowledgments
The financial support of the TAMOP-4.2.2/B-10/1-2010-0025 and the TAMOP-4.2.2.A-11/1/KONV-2012-0071 projects are gratefully acknowledged.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kulcsar, T., Bereznai, G., Sarossy, G., Auer, R., Abonyi, J. (2014). Visualisation of High Dimensional Data by Use of Genetic Programming: Application to On-line Infrared Spectroscopy Based Process Monitoring. In: Snášel, V., Krömer, P., Köppen, M., Schaefer, G. (eds) Soft Computing in Industrial Applications. Advances in Intelligent Systems and Computing, vol 223. Springer, Cham. https://doi.org/10.1007/978-3-319-00930-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-00930-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00929-2
Online ISBN: 978-3-319-00930-8
eBook Packages: EngineeringEngineering (R0)