ABSTRACT
Search-and-replace is a text processing task which may be largely automated with regular expressions: the user must describe with a specific formal language the regions to be modified (search pattern) and the corresponding desired changes (replacement expression). Writing and tuning the required expressions requires high familiarity with the corresponding formalism and is typically a lengthy, error-prone process.
In this paper we propose a tool based on Genetic Programming (GP) for generating automatically both the search pattern and the replacement expression based only on examples. The user merely provides examples of the input text along with the desired output text and does not need any knowledge about the regular expression formalism nor about GP. We are not aware of any similar proposal. We experimentally evaluated our proposal on 4 different search-and-replace tasks operating on real-world datasets and found good results, which suggests that the approach may indeed be practically viable.
- R. Babbar and N. Singh. Clustering based approach to learning regular expressions over large alphabet for noisy unstructured text. In Proceedings of the fourth workshop on Analytics for noisy unstructured text data, AND '10, pages 43--50, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- D. Barrero, D. Camacho, and M. R-Moreno. Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. Data Mining and Multi-agent Integration, pages 143--154, 2009.Google ScholarCross Ref
- A. Bartoli, G. Davanzo, A. De Lorenzo, M. Mauri, E. Medvet, and E. Sorio. Automatic generation of regular expressions from examples with genetic programming. In Proceedings of the 14th GECCO conference companion, pages 1477--1478. ACM, 2012. Google ScholarDigital Library
- F. Brauer, R. Rieger, A. Mocan, and W. Barczynski. Enabling information extraction by inference of regular expressions from sample entities. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1285--1294. ACM, 2011. Google ScholarDigital Library
- A. Cetinkaya. Regular expression generation through grammatical evolution. In Proceedings of the 2007 GECCO conference, GECCO '07, pages 2643--2646, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on, 6(2):182--197, apr 2002. Google ScholarDigital Library
- B. Dunay, F. Petry, and B. Buckles. Regular language induction with genetic programming. In Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on, volume 1, pages 396--400. IEEE, 1994.Google ScholarCross Ref
- J. Friedl. Mastering Regular Expressions. O'Reilly Media, Inc., 2006. Google ScholarDigital Library
- A. González-Pardo, D. Barrero, D. Camacho, and M. R-Moreno. A case study on grammatical-based representation for regular expression evolution. In Y. Demazeau, F. Dignum, J. Corchado, J. Bajo, R. Corchuelo, E. Corchado, F. Fernández-Riverola, V. Julián, P. Pawlewski, and A. Campbell, editors, Trends in Practical Applications of Agents and Multiagent Systems, volume 71 of Advances in Intelligent and Soft Computing, pages 379--386. Springer Berlin / Heidelberg, 2010.Google Scholar
- E. Kinber. Learning regular expressions from representative examples and membership queries. Grammatical Inference: Theoretical Results and Applications, pages 94{108, 2010. Google ScholarDigital Library
- Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and A. Arbor. Regular Expression Learning for Information Extraction. Computational Linguistics, (October):21--30, 2008. Google ScholarDigital Library
- E. Medvet and A. Bartoli. Brand-related events detection, classification and summarization on twitter. In Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT '12. IEEE Computer Society, 2012, to appear. Google ScholarDigital Library
- R. Miller and A. Marshall. Cluster-based find and replace. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 57--64. ACM, 2004. Google ScholarDigital Library
- R. Miller and B. Myers. Lapis: Smart editing with text structure. In CHI'02 extended abstracts on Human factors in computing systems, pages 496{497. ACM, 2002. Google ScholarDigital Library
- R. Miller and B. Myers. Multiple selections in smart text editing. In Proceedings of the 7th international conference on Intelligent user interfaces, pages 103--110. ACM, 2002. Google ScholarDigital Library
- R. Miller, B. Myers, et al. Lightweight structured text processing. In Proceedings of 1999 USENIX Annual Technical Conference, pages 131--144, 1999. Google ScholarDigital Library
- E. Minkov, R. C. Wang, and W. W. Cohen. Extracting personal names from email: applying named entity recognition to informal text. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT '05, pages 443{450, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Google ScholarDigital Library
- B. Svingen. Learning Regular Languages Using Genetic Programming. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, editors, Genetic Programming 1998 Proceedings of the Third Annual Conference, pages 374--376. Morgan Kaufmann, 1998.Google Scholar
- M. Tomita. Dynamic construction of finite automata from examples using hill-climbing. Proceedings of the fourth annual cognitive science conference, pages 105--108, 1982.Google Scholar
- T. Wu and W. Pottenger. A semi-supervised active learning algorithm for information extraction from textual data. Journal of the American Society for Information Science and Technology, 56(3):258--271, 2005. Google ScholarDigital Library
Index Terms
Automatic string replace by examples
Recommendations
Automatic generation of regular expressions from examples with genetic programming
GECCO '12: Proceedings of the 14th annual conference companion on Genetic and evolutionary computationWe explore the practical feasibility of a system based on genetic programming (GP) for the automatic generation of regular expressions. The user describes the desired task by providing a set of labeled examples, in the form of text lines. The system ...
Genetic programming neural networks: A powerful bioinformatics tool for human genetics
AbstractThe identification of genes that influence the risk of common, complex disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. This ...
Automatic reverse engineering algorithm for drug gene regulating networks
ASC '07: Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft ComputingAutomatically inferring gene regulating networks models from microarray time series data is one of the most challenging tasks of bioinformatics. The ordinary differential equations models are the most sensible, but very difficult to build. We introduced ...
Comments