abstract = "Recent work has shown that machine learning can
automate and in some cases outperform hand crafted
compiler optimizations. Central to such an approach is
that machine learning techniques typically rely upon
summaries or features of the program. The quality of
these features is critical to the accuracy of the
resulting machine learned algorithm; no machine
learning method will work well with poorly chosen
features. However, due to the size and complexity of
programs, theoretically there are an infinite number of
potential features to choose from. The compiler writer
now has to expend effort in choosing the best features
from this space. This paper develops a novel mechanism
to automatically find those features which most improve
the quality of the machine learned heuristic. The
feature space is described by a grammar and is then
searched with genetic programming and predictive
modeling. We apply this technique to loop unrolling in
GCC 4.3.1 and evaluate our approach on a Pentium 6. On
a benchmark suite of 57 programs, GCC's hard-coded
heuristic achieves only 3percent of the maximum
performance available, while a state of the art machine
learning approach with hand-coded features obtains
59percent. Our feature generation technique is able to
achieve 76percent of the maximum available speedup,
outperforming existing approaches.",
notes = "p85 'Our search technique is a hybrid between
Grammatical Evolution [12] and Genetic Programming
[13].'