skip to main content
10.1145/2463372.2463532acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Automatic string replace by examples

Published:06 July 2013Publication History

ABSTRACT

Search-and-replace is a text processing task which may be largely automated with regular expressions: the user must describe with a specific formal language the regions to be modified (search pattern) and the corresponding desired changes (replacement expression). Writing and tuning the required expressions requires high familiarity with the corresponding formalism and is typically a lengthy, error-prone process.

In this paper we propose a tool based on Genetic Programming (GP) for generating automatically both the search pattern and the replacement expression based only on examples. The user merely provides examples of the input text along with the desired output text and does not need any knowledge about the regular expression formalism nor about GP. We are not aware of any similar proposal. We experimentally evaluated our proposal on 4 different search-and-replace tasks operating on real-world datasets and found good results, which suggests that the approach may indeed be practically viable.

References

  1. R. Babbar and N. Singh. Clustering based approach to learning regular expressions over large alphabet for noisy unstructured text. In Proceedings of the fourth workshop on Analytics for noisy unstructured text data, AND '10, pages 43--50, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Barrero, D. Camacho, and M. R-Moreno. Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. Data Mining and Multi-agent Integration, pages 143--154, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Bartoli, G. Davanzo, A. De Lorenzo, M. Mauri, E. Medvet, and E. Sorio. Automatic generation of regular expressions from examples with genetic programming. In Proceedings of the 14th GECCO conference companion, pages 1477--1478. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Brauer, R. Rieger, A. Mocan, and W. Barczynski. Enabling information extraction by inference of regular expressions from sample entities. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1285--1294. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Cetinkaya. Regular expression generation through grammatical evolution. In Proceedings of the 2007 GECCO conference, GECCO '07, pages 2643--2646, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on, 6(2):182--197, apr 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Dunay, F. Petry, and B. Buckles. Regular language induction with genetic programming. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, volume 1, pages 396--400. IEEE, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Friedl. Mastering Regular Expressions. O'Reilly Media, Inc., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. González-Pardo, D. Barrero, D. Camacho, and M. R-Moreno. A case study on grammatical-based representation for regular expression evolution. In Y. Demazeau, F. Dignum, J. Corchado, J. Bajo, R. Corchuelo, E. Corchado, F. Fernández-Riverola, V. Julián, P. Pawlewski, and A. Campbell, editors, Trends in Practical Applications of Agents and Multiagent Systems, volume 71 of Advances in Intelligent and Soft Computing, pages 379--386. Springer Berlin / Heidelberg, 2010.Google ScholarGoogle Scholar
  10. E. Kinber. Learning regular expressions from representative examples and membership queries. Grammatical Inference: Theoretical Results and Applications, pages 94{108, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and A. Arbor. Regular Expression Learning for Information Extraction. Computational Linguistics, (October):21--30, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Medvet and A. Bartoli. Brand-related events detection, classification and summarization on twitter. In Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT '12. IEEE Computer Society, 2012, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Miller and A. Marshall. Cluster-based find and replace. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 57--64. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Miller and B. Myers. Lapis: Smart editing with text structure. In CHI'02 extended abstracts on Human factors in computing systems, pages 496{497. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Miller and B. Myers. Multiple selections in smart text editing. In Proceedings of the 7th international conference on Intelligent user interfaces, pages 103--110. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Miller, B. Myers, et al. Lightweight structured text processing. In Proceedings of 1999 USENIX Annual Technical Conference, pages 131--144, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Minkov, R. C. Wang, and W. W. Cohen. Extracting personal names from email: applying named entity recognition to informal text. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT '05, pages 443{450, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Svingen. Learning Regular Languages Using Genetic Programming. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, editors, Genetic Programming 1998 Proceedings of the Third Annual Conference, pages 374--376. Morgan Kaufmann, 1998.Google ScholarGoogle Scholar
  19. M. Tomita. Dynamic construction of finite automata from examples using hill-climbing. Proceedings of the fourth annual cognitive science conference, pages 105--108, 1982.Google ScholarGoogle Scholar
  20. T. Wu and W. Pottenger. A semi-supervised active learning algorithm for information extraction from textual data. Journal of the American Society for Information Science and Technology, 56(3):258--271, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic string replace by examples

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GECCO '13: Proceedings of the 15th annual conference on Genetic and evolutionary computation
        July 2013
        1672 pages
        ISBN:9781450319638
        DOI:10.1145/2463372
        • Editor:
        • Christian Blum,
        • General Chair:
        • Enrique Alba

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 July 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        GECCO '13 Paper Acceptance Rate204of570submissions,36%Overall Acceptance Rate1,669of4,410submissions,38%

        Upcoming Conference

        GECCO '24
        Genetic and Evolutionary Computation Conference
        July 14 - 18, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader