Skip to main content
Log in

Genetic programming for natural language processing

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

This work takes us through the literature on applications of genetic programming to problems of natural language processing. The purpose of natural language processing is to allow us to communicate with computers in natural language. Among the problems addressed in the area is, for example, the extraction of information, which draws relevant data from unstructured texts written in natural language. There are also domains of application of particular relevance because of the difficulty in dealing with the corresponding documents, such as opinion mining in social networks, or because of the need for high precision in the information extracted, such as the biomedical domain. There have been proposals to apply genetic programming techniques in several of these areas. This tour allows us to observe the potential—not yet fully exploited—of such applications. We also review some cases in which genetic programming can provide information that is absent from other approaches, revealing its ability to provide easy to interpret results, in form of programs or functions. Finally, we identify some important challenges in the area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. L. Araujo, Genetic programming for natural language parsing, in Proceedings of the European Conference on Genetic Programming (EuroGP2004), Lecture Notes in Computer Science, vol. 3003 (Springer, Berlin, 2004), pp. 230–239

  2. L. Araujo, Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Trans. Evol. Comput. 8(1), 14–27 (2004)

    Google Scholar 

  3. L. Araujo, Multiobjective genetic programming for natural language parsing and tagging, in PPSN (2006), pp. 433–442

  4. L. Araujo, How evolutionary algorithms are applied to statistical natural language processing. Artif. Intell. Rev. 28(4), 275–303 (2007)

    Google Scholar 

  5. L. Araujo, J. Martinez-Romo, A.D. Fernandez, Discovering taxonomies in Wikipedia by means of grammatical evolution. Soft Comput. 22(9), 2907–2919 (2018)

    Google Scholar 

  6. A. Bartoli, G. Davanzo, A. De Lorenzo, E. Medvet, E. Sorio, Automatic synthesis of regular expressions from examples. Computer 47(12), 72–80 (2014)

    Google Scholar 

  7. A. Bartoli, A. De Lorenzo, E. Medvet, F. Tarlao, M. Virgolin, Evolutionary learning of syntax patterns for genic interaction extraction, in Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO ’15 (ACM, New York, 2015), pp. 1183–1190

  8. A. Bartoli, A.D. Lorenzo, E. Medvet, F. Tarlao, Syntactical similarity learning by means of grammatical evolution, in PPSN, Lecture Notes in Computer Science, vol. 9921 (Springer, Berlin, 2016), pp. 260–269

  9. A. Bartoli, A.D. Lorenzo, E. Medvet, F. Tarlao, Active learning of regular expressions for entity extraction. IEEE Trans. Cybern. 48(3), 1067–1080 (2018)

    Google Scholar 

  10. V. Basto-Fernandes, I. Yevseyeva, R.Z. Frantz, C. Grilo, N.P. Díaz, M. Emmerich, An automatic generation of textual pattern rules for digital content filters proposal, using grammatical evolution genetic programming. Proc. Technol. 16, 806–812 (2014)

    Google Scholar 

  11. A. Bergström, P. Jaksetic, P. Nordin, Enhancing information retrieval by automatic acquisition of textual relations using genetic programming, in Proceedings of the 5th International Conference on Intelligent User Interfaces, IUI ’00 (ACM, New York, 2000), pp. 29–32

  12. J. Bootkrajang, S. Kim, B. Zhang, Evolutionary hypernetwork classifiers for protein–protein interaction sentence filtering, in Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Québec, Canada, July 8–12, 2009, ed. by F. Rothlauf (2009), pp. 185–192

  13. M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)

    MATH  Google Scholar 

  14. W.W. Chapman, K.B. Cohen, Current issues in biomedical text mining and natural language processing. J. Biomed. Inf. 42(5), 757–759 (2009)

    Google Scholar 

  15. P. Charles, Project title. https://github.com/charlespwd/project-title (2013)

  16. H. Christiansen, A survey of adaptable grammars. SIGPLAN Not. 25(11), 35–44 (1990)

    Google Scholar 

  17. A.M. Cohen, W.R. Hersh, A survey of current work in biomedical text mining. Brief. Bioinf. 6(1), 57–71 (2005)

    Google Scholar 

  18. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  19. E. Conrad, Detecting Spam With Genetic Regular Expressions, Technical report (SANS Technology Institute, 2007)

  20. O. Cordón, E. Herrera-Viedma, C. López-Pujalte, M. Luque, C. Zarco, A review on the application of evolutionary computation to information retrieval. Int. J. Approx. Reason. 34(2–3), 241–264 (2003)

    MathSciNet  MATH  Google Scholar 

  21. M.G. de Carvalho, A.H.F. Laender, M.A. Goncalves, A.S. da Silva, A genetic programming approach to record deduplication. IEEE Trans. Knowl. Data Eng. 24(3), 399–412 (2012)

    Google Scholar 

  22. P.G. Espejo, S. Ventura, F. Herrera, A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)

    Google Scholar 

  23. H. Fabregat, L. Araujo, J. Martinez-Romo, Deep neural models for extracting entities and relationships in the new RDD corpus relating disabilities and rare diseases. Comput. Methods Programs Biomed. 164, 121–129 (2018)

    Google Scholar 

  24. S. Faralli, A. Panchenko, C. Biemann, S.P. Ponzetto, Linked disambiguated distributional semantic networks, in International Semantic Web Conference (2). Lecture Notes in Computer Science, vol. 9982 (2016), pp. 56–64

  25. M. Faruqui, J. Dodge, S.K. Jauhar, C. Dyer, E. Hovy, N.A. Smith, Retrofitting word vectors to semantic lexicons, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, 2015), pp. 1606–1615

  26. F. Frasincar, J. Borsje, F. Hogenboom, E-Business applications for product development and competitive growth: emerging technologies, chap., in Personalizing News Services Using Semantic Web Technologies (IGI Global 2011), pp. 261–289

  27. A. González-Pardo, D. Camacho, Analysis of grammatical evolutionary approaches to regular expression induction, in IEEE Congress on Evolutionary Computation (IEEE 2011), pp. 639–646

  28. M. Graff, E.S. Tellez, H.J. Escalante, S. Miranda-Jiménez, Semantic genetic programming for sentiment analysis, in NEO, Studies in Computational Intelligence, vol. 663 (Springer, Berlin, 2015), pp. 43–65

  29. M. Graff, E.S. Tellez, S. Miranda-Jiménez, H.J. Escalante, Evodag: a semantic genetic programming python library, in 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC, 2016), pp. 1–6

  30. R. Greenstadt, M. Kaminsky, Evolving Spam Filters Using Genetic Algorithms, Technical Report 3836. (Massachusetts Institute of Technology, 2002)

  31. A. Holzinger, C. Biemann, C.S. Pattichis, D.B. Kell, What do we need to build explainable AI systems for the medical domain? CoRR arXiv:1712.09923 (2017)

  32. A. Holzinger, J. Schantl, M. Schroettner, C. Seifert, K. Verspoor, Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges (Springer, Berlin, 2014), pp. 271–300

    Google Scholar 

  33. W. IJntema, F. Hogenboom, F. Frasincar, D. Vandic, A genetic programming approach for learning semantic information extraction rules from news, in Web Information Systems Engineering—WISE 2014—15th International Conference, Thessaloniki, Greece, October 12–14, 2014, Proceedings, Part I, Lecture Notes in Computer Science, vol. 8786, ed. by B. Benatallah, A. Bestavros, Y. Manolopoulos, A. Vakali, Y. Zhang (Springer, Berlin, 2014), pp. 418–432

  34. W. IJntema, J. Sangers, F. Hogenboom, F. Frasincar, A lexico-semantic pattern language for learning ontology instances from text. Web Semant. Sci. Serv. Agents World Wide Web 15(3), 37–50 (2012)

    Google Scholar 

  35. R. Isele, C. Bizer, Active learning of expressive linkage rules using genetic programming. Web Semant. Sci. Serv. Agents World Wide Web 23, 2–15 (2013)

    Google Scholar 

  36. D. Jurafsky, J.H. Martin, Speech and Language Processing, 2nd edn. (Prentice-Hall Inc, Upper Saddle River, 2009)

    Google Scholar 

  37. A. Khorsi, An overview of content-based spam filtering techniques. Informatica (Slovenia) 31(3), 269–277 (2007)

    MATH  Google Scholar 

  38. K.M. Kim, S.S. Lim, S.B. Cho, User adaptive answers generation for conversational agent using genetic programming, in Intelligent Data Engineering and Automated Learning—IDEAL 2004, ed. by Z.R. Yang, H. Yin, R.M. Everson (Springer, Berlin, 2004), pp. 813–819

    Google Scholar 

  39. E.E. Korkmaz, G. Üçoluk, A controlled genetic programming approach for the deceptive domain. IEEE Trans. Syst. Man Cybern. Part B 34(4), 1730–1742 (2004)

    Google Scholar 

  40. I. Korkontzelos, D. Piliouras, A.W. Dowsey, S. Ananiadou, Boosting drug named entity recognition using an aggregate classifier. Artif. Intell. Med. 65(2), 145–153 (2015)

    Google Scholar 

  41. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)

    MATH  Google Scholar 

  42. M. Lan, C.L. Tan, J. Su, Feature generation and representations for protein–protein interaction classification. J. Biomed. Inf. 42(5), 866–872 (2009)

    Google Scholar 

  43. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436 (2015)

    Google Scholar 

  44. F. Li, M. Zhang, G. Fu, D. Ji, A neural joint model for entity and relation extraction from biomedical text. BMC Bioinf. 18(1), 198:1–198:11 (2017)

    Google Scholar 

  45. S. Lim, S. Cho, Language generation for conversational agent by evolution of plan trees with genetic programming, in MDAI, Lecture Notes in Computer Science, vol. 3558 (Springer, Berlin, 2005), pp. 305–315

  46. B. Liu, L. Zhang, A Survey of Opinion Mining and Sentiment Analysis (Springer, New York, 2013), pp. 415–463

    Google Scholar 

  47. C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, New York, 2008)

    MATH  Google Scholar 

  48. H. Manurung, An Evolutionary Algorithm Approach to Poetry Generation, Ph.D. thesis (University of Edinburgh, School of Informatics, 2003)

  49. R. Manurung, G. Ritchie, H. Thompson, An implementation of a flexible author-reviewer model of generation using genetic algorithms, in Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation (PACLIC) (De La Salle University (DLSU), Manila, 2008), pp. 272–281

  50. E. Martínez-Cámara, M.C. Díaz-Galiano, M. Ángel García-Cumbreras García-Vega, M. Villena-Román, J.: Overview of TASS 2017, in TASS@SEPLN, CEUR Workshop Proceedings. CEUR-WS.org (2017), pp. 13–21

  51. K.R. McKeown, Text Generation—Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Studies in Natural Language Processing (Cambridge University Press, Cambridge, 1992)

    Google Scholar 

  52. G.A. Miller, Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Google Scholar 

  53. M. Miwa, M. Bansal, End-to-end relation extraction using LSTMs on sequences and tree structures, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 1 (Long Papers, 2016), pp. 1105–1116

  54. D. Moctezuma, M. Graff, S. Miranda-Jiménez, E.S. Tellez, A. Coronado, CN. Sánchez, J. Ortiz-Bejar, A genetic programming approach to sentiment analysis for twitter: Tass17, in TASS 2017: Workshop on Semantic Analysis at SEPLN (CEUR, 2017), pp. 23–28

  55. A. Moraglio, K. Krawiec, C.G. Johnson, Geometric semantic genetic programming, in PPSN (1), Lecture Notes in Computer Science, vol. 7491 (Springer, Berlin, 2012), pp. 21–31

  56. D. Nadeau, S. Sekine, A survey of named entity recognition and classification. Linguist. Invest. 30(1), 3–26 (2007)

    Google Scholar 

  57. R. Navigli, S.P. Ponzetto, BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    MathSciNet  MATH  Google Scholar 

  58. M. O’Neill, C. Ryan, Under the hood of grammatical evolution, in Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation—Volume 2, GECCO’99 (Morgan Kaufmann Publishers Inc., Los Altos, 1999), pp. 1143–1148

  59. M. O’Neill, C. Ryan, Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)

    Google Scholar 

  60. A. Ortega, M. de la Cruz, M. Alfonseca, Christiansen grammar evolution: grammatical evolution with semantics. IEEE Trans. Evol. Comput. 11(1), 77–90 (2007)

    Google Scholar 

  61. B. Percha, R.B. Altman, Learning the structure of biomedical relationships from unstructured text. PLoS Comput. Biol. 11(7), e1004216 (2015)

    Google Scholar 

  62. R. Perera, P. Nand, Recent advances in natural language generation: a survey and classification of the empirical literature. Comput. Inf. 36(1), 1–32 (2017)

    MathSciNet  Google Scholar 

  63. C.P. Rose, A genetic programming approach for robust language interpretation, in Advances in Genetic Programming, vol. 3, ed. by L. Spector, W.B. Langdon, U.M. O’Reilly, P.J. Angeline (MIT Press, Cambridge, 1999), pp. 67–88

    Google Scholar 

  64. D. Ruano-Ordás, F. Fdez-Riverola, J.R. Méndez, Using evolutionary computation for discovering spam patterns from e-mail samples. Inf. Process. Manag. 54(2), 303–317 (2018)

    Google Scholar 

  65. C. Ryan, J. Collins, J. Collins, M. O’Neill, Grammatical evolution: evolving programs for an arbitrary language, in Lecture Notes in Computer Science, Proceedings of the First European Workshop on Genetic Programming, vol. 1391 (Springer, Berlin, 1998), pp. 83–95

  66. A. Schwartz, SpamAssassin (O’Reilly Media Inc., Newton, 2004)

    Google Scholar 

  67. T.C. Smith, I.H. Witten, A genetic algorithm for the induction of natural language grammars, in Proceedings of the IJCAI-95 Workshop on New Approaches to Learning for Natural Language Processing (1995), pp. 17–24

  68. M. Suganuma, S. Shirakawa, T. Nagao, A genetic programming approach to designing convolutional neural network architectures, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17 (ACM, New York, 2017), pp. 497–504

  69. H. Takagi, Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation. Proc. IEEE 89(9), 1275–1296 (2001)

    Google Scholar 

  70. I. Tiddi, M. d’Aquin, E. Motta, Learning to assess linked data relationships using genetic programming, in International Semantic Web Conference (1). Lecture Notes in Computer Science, vol. 9981 (2016), pp. 581–597

  71. J. Villena-Román, J. García-Morera, MÁG. Cumbreras, E. Martínez-Cámara, MT. Martín-Valdivia, LAU. López, Overview of TASS 2015, in TASS@SEPLN, CEUR Workshop Proceedings, vol. 1397, CEUR-WS.org (2015), pp. 13–21

  72. S. Winkler, S. Schaller, V. Dorfer, M. Affenzeller, G. Petz, M. Karpowicz, Data-based prediction of sentiments using heterogeneous model ensembles. Soft Comput. 19(12), 3401–3412 (2015)

    Google Scholar 

  73. H.Y. Wu, S. Karnik, A. Subhadarshini, Z. Wang, S. Philips, X. Han, C. Chiang, L. Liu, M. Boustani, L.M. Rocha, S.K. Quinney, D. Flockhart, L. Li, An integrated pharmacokinetics ontology and corpus for text mining. BMC Bioinf. 14, 35 (2013)

    Google Scholar 

  74. V. Yadav, S. Bethard, A survey on recent advances in named entity recognition from deep learning models, in Proceedings of the 27th International Conference on Computational Linguistics (Association for Computational Linguistics, 2018), pp. 2145–2158

  75. T. Young, D. Hazarika, S. Poria, E. Cambria, Recent trends in deep learning based natural language processing. IEEE Comput. Int. Mag. 13(3), 55–75 (2018)

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the Spanish Ministry of Science and Innovation within the Projects PROSA-MED (TIN2016-77820-C3-2-R).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lourdes Araujo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Araujo, L. Genetic programming for natural language processing. Genet Program Evolvable Mach 21, 11–32 (2020). https://doi.org/10.1007/s10710-019-09361-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-019-09361-5

Keywords

Navigation