abstract = "With processors tending toward more processing cores
instead of higher clock speeds, developers are
increasingly forced to grapple with concurrent
paradigms to maximally exploit new CPU designs.
Embracing concurrent paradigms entails the potential
risk of encountering concurrent software faults.
Concurrent faults result from unforeseen timings of
interactions between concurrent components, as opposed
to traditional software faults that arise from
functional failures. As a result, concurrent faults
have a higher probability of surviving the software
development process, potentially causing a catastrophic
failure of high cost. As the complexity of software and
hardware systems increases they become increasingly
difficult to test. One measure of complexity is the
number of potential execution paths a system can
follow, with a higher complexity attributed to a
greater number of paths. In concurrent software, the
number of execution paths in a system typically
increases exponentially as the number of concurrent
components increase. Testing complex concurrent
software is therefore difficult, with state of the art
static and dynamic analysis techniques yielding only
false positives or exhausted resources. This problem is
likely only to be exacerbated given the trends
highlighted above. Stochastic metaheuristic search
techniques can often triumph where deterministic or
analytical techniques fail. Methods such as Genetic
Algorithms and Ant Colony Optimisation have shown great
strength on hard problems, including testing concurrent
software. Metaheuristic techniques often trade a
perfect solution for good enough solutions, and merely
accurately detecting a concurrent fault is better than
allowing a fault to survive to a production system.
Whilst metaheuristic techniques have had some success
in this domain, the state of the art still struggles
for a high success rate in some circumstances. There
are a few metaheuristic search techniques that have yet
to be tried in this area, and this thesis presents a
study on one such technique. This thesis presents a
literature review, detailing the state of the art in
detecting concurrent faults in software and hardware
systems. Following a review of metaheuristic techniques
applied to finding concurrent faults, I set out a
hypothesis asserting that a particular subclass of
metaheuristic techniques, Estimation of Distribution
Algorithms, are effective in detecting and debugging
concurrent faults. To investigate the hypothesis, I
first make an algorithmic proposal based on a
particular EDA to search the state space of concurrent
systems. I then demonstrate through experimentation the
ability of the algorithm to detect faults and to
optimise the quality of faults found in systems derived
from industrial scenarios. I also outline methods of
using features unique to EDAs to scale to large
systems. Finally, I complete the thesis with a
conclusion examining the hypothesis with respect to the
evidence collected from empirical work, highlighting
the novel aspects of the thesis and outlining future
paths of research.",
notes = "brief mention of GP. Appears to be an aid to manual
debugging rather than automatic bug fixer