Skip to main content

Syntactical Similarity Learning by Means of Grammatical Evolution

  • Conference paper
  • First Online:
Book cover Parallel Problem Solving from Nature – PPSN XIV (PPSN 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9921))

Included in the following conference series:

Abstract

Several research efforts have shown that a similarity function synthesized from examples may capture an application-specific similarity criterion in a way that fits the application needs more effectively than a generic distance definition. In this work, we propose a similarity learning algorithm tailored to problems of syntax-based entity extraction from unstructured text streams. The algorithm takes in input pairs of strings along with an indication of whether they adhere or not adhere to the same syntactic pattern. Our approach is based on Grammatical Evolution and explores systematically a similarity definition space including all functions that may be expressed with a specialized, simple language that we have defined for this purpose. We assessed our proposal on patterns representative of practical applications. The results suggest that the proposed approach is indeed feasible and that the learned similarity function is more effective than the Levenshtein distance and the Jaccard similarity index.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, L., Jin, R.: Distance metric learning: a comprehensive survey. Michigan State Universiy 2 (2006)

    Google Scholar 

  2. Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data (2013). arXiv preprint arXiv:1306.6709

  4. Fernau, H.: Algorithms for learning regular expressions from positive data. Inf. Comput. 207(4), 521–541 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cicchello, O., Kremer, S.C.: Inducing grammars from sparse data sets: a survey of algorithms and results. J. Mach. Learn. Res. 4, 603–632 (2003)

    MathSciNet  MATH  Google Scholar 

  6. Cetinkaya, A.: Regular expression generation through grammatical evolution. In: Proceedings of the 2007 GECCO Conference Companion on Genetic and Evolutionary Computation, pp. 2643–2646. ACM (2007)

    Google Scholar 

  7. Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Jagadish, H.: Regular expression learning for information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 21–30. Association for Computational Linguistics (2008)

    Google Scholar 

  8. Brauer, F., Rieger, R., Mocan, A., Barczynski, W.M.: Enabling information extraction by inference of regular expressions from sample entities. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1285–1294. ACM (2011)

    Google Scholar 

  9. Murthy, K., P., D., Deshpande, P.M.: Improving recall of regular expressions for information extraction. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 455–467. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Bartoli, A., Davanzo, G., De Lorenzo, A., Mauri, M., Medvet, E., Sorio, E.: Automatic generation of regular expressions from examples with genetic programming. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1477–1478. ACM (2012)

    Google Scholar 

  11. Bartoli, A., Davanzo, G., De Lorenzo, A., Medvet, E., Sorio, E.: Automatic synthesis of regular expressions from examples. Computer 12, 72–80 (2014)

    Article  Google Scholar 

  12. Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Learning text patterns using separate-and-conquer genetic programming. In: Machado, P., et al. (eds.) Genetic Programming, vol. 9025, pp. 16–27. Springer, Cham (2015)

    Google Scholar 

  13. Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Active learning approaches for learning regular expressions with genetic programming. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 97–102. ACM (2016)

    Google Scholar 

  14. Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Inference of regular expressions for text extraction from examples. IEEE Trans. Knowl. Data Eng. 28(5), 1217–1230 (2016)

    Article  Google Scholar 

  15. Megano, T., Fukui, K.i., Numao, M., Ono, S.: Evolutionary multi-objective distance metric learning for multi-label clustering. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 2945–2952. IEEE (2015)

    Google Scholar 

  16. Stahl, A., Gabel, T.: Using evolution programs to learn local similarity measures. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 537–551. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  17. Xiong, N., Funk, P.: Building similarity metrics reflecting utility in case-based reasoning. J. Intell. Fuzzy Syst. 17(4), 407–416 (2006)

    MATH  Google Scholar 

  18. Xiong, N.: Learning fuzzy rules for similarity assessment in case-based reasoning. Expert Syst. Appl. 38(9), 10780–10786 (2011)

    Article  Google Scholar 

  19. Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems (NIPS), p. 41 (2004)

    Google Scholar 

  20. Xiong, S., Pei, Y., Rosales, R., Fern, X.Z.: Active learning from relative comparisons. IEEE Trans. Knowl. Data Eng. 27(12), 3166–3175 (2015)

    Article  Google Scholar 

  21. Hao, S., Zhao, P., Hoi, S.C., Miao, C.: Learning relative similarity from data streams: active online learning approaches. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1181–1190. ACM (2015)

    Google Scholar 

  22. Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  23. O’Neill, M., Ryan, C.: Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to Michele Furlanetto who contributed in the implementation of our proposed method.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Medvet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F. (2016). Syntactical Similarity Learning by Means of Grammatical Evolution. In: Handl, J., Hart, E., Lewis, P., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds) Parallel Problem Solving from Nature – PPSN XIV. PPSN 2016. Lecture Notes in Computer Science(), vol 9921. Springer, Cham. https://doi.org/10.1007/978-3-319-45823-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45823-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45822-9

  • Online ISBN: 978-3-319-45823-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics