Abstract
Document classification tasks generally have sparse and high dimensional features. It is important to effectively extract features. In document classification tasks, there are some similarities existing in different categories or different datasets. It is possible that one document classification task does not have labelled training data. In order to obtain effective classifiers on this specific task, this paper proposes a Genetic Programming (GP) system using transductive transfer learning. The proposed GP system automatically extracts features from different source domains, and these GP extracted features are combined to form new classifiers being directly applied to a target domain. From experimental results, the proposed transductive transfer learning GP system can evolve features from source domains to effectively apply to target domains which are similar to the source domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, B., Mittal, N.: Text classification using machine learning methods-a survey. In: Babu, B.V., Nagar, A., Deep, K., Pant, M., Bansal, J.C., Ray, K., Gupta, U. (eds.) Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. AISC, vol. 236, pp. 701–709. Springer, New Delhi (2014). doi:10.1007/978-81-322-1602-5_75
Bhowan, U., McCloskey, D.J.: Genetic programming for feature selection and question-answer ranking in IBM watson. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., GarcÃa-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 153–166. Springer, Cham (2015). doi:10.1007/978-3-319-16501-1_13
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval, SDAIR-1994, pp. 161–175 (1994)
Chen, Q., Xue, B., Niu, B., Zhang, M.: Improving generalisation of genetic programming for high-dimensional symbolic regression with feature selection. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3793–3800 (2016)
Chen, Q., Zhang, M., Xue, B.: Feature selection to improve generalisation of genetic programming for high-dimensional symbolic regression. IEEE Trans. Evol. Comput. PP(99), 1 (2017)
Escalante, H.J., GarcÃa-Limón, M.A., Morales-Reyes, A., Graff, M., Montes-y-Gómez, M., Morales, E.F., MartÃnez-Carranza, J.: Term-weighting learning via genetic programming for text classification. Knowl.-Based Syst. 83, 176–189 (2015)
Fu, W., Johnston, M., Zhang, M.: Low-level feature extraction for edge detection using genetic programming. IEEE Trans. Cybern. 44(8), 1459–1472 (2014)
Fu, W., Johnston, M., Zhang, M.: Distribution-based invariant feature construction using genetic programming for edge detection. Soft Comput. 19(8), 2371–2389 (2015)
Gong, B., Grauman, K., Sha, F.: Learning kernels for unsupervised domain adaptation with applications to visual object recognition. Int. J. Comput. Vis. 109(1), 3–27 (2014)
Hirsch, L., Saeedi, M., Hirsch, R.: Evolving rules for document classification. In: Keijzer, M., Tettamanzi, A., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 85–95. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31989-4_8
Hirsch, L., Saeedi, M., Hirsch, R.: Evolving text classification rules with genetic programming. Appl. Artif. Intell. 19(7), 659–676 (2005)
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.1007/BFb0026683
Khan, A., Baharudin, B., Lee, L.H., Khan, K., Tronoh, U.T.P.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. (2010)
Khodadi, I., Abadeh, M.S.: Genetic programming-based feature learning for question answering. Inf. Process. Manage. 52(2), 340–357 (2016)
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Machine Learning Conference (ML95) (1995)
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 9 (2016)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Zhang, B., Fan, W., Chen, Y., Fox, E.A., Gonçalves, M.A., Cristo, M., Calado, P.: A genetic programming approach for combining structural and citation-based evidence for text classification in web digital libraries. In: Herrera-Viedma, E., Pasi, G., Crestani, F. (eds.) Soft Computing in Web Information Retrieval. Studies in Fuzziness and Soft Computing, vol. 197, pp. 65–83. Springer, Heidelberg (2006). doi:10.1007/3-540-31590-X_4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Fu, W., Xue, B., Zhang, M., Gao, X. (2017). Transductive Transfer Learning in Genetic Programming for Document Classification. In: Shi, Y., et al. Simulated Evolution and Learning. SEAL 2017. Lecture Notes in Computer Science(), vol 10593. Springer, Cham. https://doi.org/10.1007/978-3-319-68759-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-68759-9_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68758-2
Online ISBN: 978-3-319-68759-9
eBook Packages: Computer ScienceComputer Science (R0)