Deep Pattern Mining for Program Repair

author = "Kui Liu",

title = "Deep Pattern Mining for Program Repair",

school = "Interdisciplinary Centre for Security, Reliability and Trust (SNT), University of Luxembourg",

year = "2019",

address = "Luxembourg",

month = "18 " # dec,

keywords = "genetic algorithms, genetic programming, genetic improvement, APR, Security, Reliability and Trust, Program repair, fix pattern, pattern mining, fault localization, inconsistent method name",

URL = "

https://orbilu.uni.lu/bitstream/10993/41348/1/thesis.pdf",

URL = "

http://hdl.handle.net/10993/41348",

size = "173 pages",

abstract = "Error-free software is a myth. Debugging thus accounts for a significant portion of software maintenance and absorbs a large part of software cost. In particular, the manual task of fixing bugs is tedious, error- prone and time-consuming. In the last decade, automatic bug-fixing, also referred to as automated program repair (APR) has boomed as a promising endeavour of software engineering towards alleviating developers burden. Several potentially promising techniques have been proposed making APR an increasingly prominent topic in both the research and practice communities. In production, APR will drastically reduce time-to-fix delays and limit downtime. In a development cycle, APR can help suggest changes to accelerate debugging.

As an emergent domain, however, program repair has many open problems that the community is still exploring. Our work contributes to this momentum on two angles: the repair of programs for functionality bugs, and the repair of programs for method naming issues. The thesis starts with highlighting findings on key empirical studies that we have performed to inform future repair approaches. Then, we focus on template-based program repair scenarios and explore deep learning models for inferring accurate and relevant patterns. Finally, we integrate these patterns into APR pipelines, which yield the state of the art repair tools. The dissertation includes the following contributions:

Real-world Patch Study: Existing APR studies have shown that the state-of-the-art techniques in automated repair tend to generate patches only for a small number of bugs even with quality issues (e.g., incorrect behaviour and nonsensical changes). To improve APR techniques, the community should deepen its knowledge on repair actions from real-world patches since most of the techniques rely on patches written by human developers. However, previous investigations on real-world patches are limited to statement level that is not sufficiently fine-grained to build this knowledge. This dissertation starts with deepening this knowledge via a systematic and fine-grained study of real-world Java program bug fixes.

Fault Localization Impact: Existing test-suite-based APR systems are highly dependent on the performance of the fault localization (FL) technique that is the process of the widely studied APR pipeline. However, APR systems generally focus on the patch generation, but tend to use similar but different strategies for fault localization. To assess the impact of FL on APR, we identify and investigate a practical bias caused by the FL step in a repair pipeline. We propose to highlight the different FL configurations used in the literature, and their impact on APR systems when applied to the real bugs. Then, we explore the performance variations that can be achieved by tweaking the FL step.

Fix Pattern Mining: Fix patterns (a.k.a. fix templates) have been studied in various APR scenarios. Particularly, fix patterns have been widely used in different APR systems. To date, fix pattern mining is mainly studied in three ways: manually summarisation, transformation inferring and code change action statistics. In this dissertation, we explore mining fix patterns for static bugs leveraging deep learning and clustering algorithms.

Avatar: Fix pattern based patch generation is a promising direction in the APR community. Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of fix pattern based APR systems, however, depends on the fix ingredients mined from commit changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. We propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the Avatar APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation.

TBar: Fix patterns are widely used in patch generation of APR, however, the repair performance of a single fix pattern is not studied. We revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches.

Debugging Method Names: Except the issues about semantic/static bugs in programs, we note that how to debug inconsistent method names automatically is important to improve program quality. In this dissertation, we propose a deep learning based approach to spotting and refactoring inconsistent method names in programs.",

notes = "Language : English

Supervisor Dr. Yves Le Traon",