Genetic Algorithms Digest Tuesday, February 14, 1995 Volume 9 : Issue 7
- Do NOT send email or reply to gadistr@aic.nrl.navy.mil (GA List Moderator)
- Send submissions (articles) to GA-List@AIC.NRL.NAVY.MIL
- Send administrative (subscribe, unsubscribe, change of address, etc.,)
requests to GA-List-Request@AIC.NRL.NAVY.MIL
******************************************************************************
- You can access back issues, GA code, conference announcements, etc.,
either through the WWW at URL http://www.aic.nrl.navy.mil/galist/ or
through anonymous ftp at ftp.aic.nrl.navy.mil [192.26.18.68] in /pub/galist.
******************************************************************************
From: wgm@santafe.edu (Bill Macready)
Date: Tue, 7 Feb 95 19:02:44 MST
Subject: paper available - No Free Lunch Theorems for Search
No Free Lunch Theorems for Search
D.H. Wolpert, W.G. Macready
We show that all algorithms that search for an extremum of a cost
function perform exactly the same, when averaged over all possible
cost functions. In particular, if algorithm A outperforms algorithm
B on some cost functions, then loosely speaking there must exist
exactly as many other functions where B outperforms A. Starting from
this we analyze a number of the other a priori characteristics of
the search problem, like its geometry and its information-theoretic
aspects. This analysis allows us to derive mathematical benchmarks
for assessing a particular search algorithm's performance. We also
investigate minimax aspects of the search problem, the validity of
using characteristics of a partial search over a cost function to
predict future behavior of the search algorithm on that cost
function, and time-varying cost functions. We conclude with some
discussion of the justifiability of biologically-inspired search
methods.
(32 pages)
To obtain an electronic copy of this paper:
ftp ftp.santafe.edu
login: anonymous
password:
cd /pub/wgm
get nfl.ps
quit
Then at your system:
lpr -P
If you have trouble getting any of these papers electronically, you can
request a hard copy from Deborah Smith (drs@santafe.edu), Santa Fe
Institute, 1399 Hyde Park Road, Santa Fe, NM, USA, 87501.
Bill Macready
wgm@santafe.edu
-----------------------------------------------------------------------------
From: William M. Spears (spears@aic.nrl.navy.mil)
Date: Wed, 17 Aug 1994 (Excerpt from my August message)
Subject: Conservation Law for search techniques
This discussion prompted me to think about why it is that
I rarely modify my GA. Why don't I use uniform crossover, ranking,
steady state, etc.? I realized it is because I had an implicit
conservation law in my head saying "Well, making this modification
will help me on some problems, but hurt me on others, so why bother?"
So, I started wondering if my implicit assumption were true. For
example, suppose I consider two search techniques: random search
and enumerative search. Then, over all possible problems I think
that one could state that (using a reasonable performance measure)
enumerative search has better expected performance, if all problems
are equally likely. The reason appears to be simple: random search
samples with replacement, while enumerative search does not.
Now, suppose we only consider stochastic search algorithms that
sample with replacement. Can we make any strong statements? For
example, will the addition of crossover help you as often as
hurt you? Is binary coding in any sense "better" than another coding?
Of course, the reader may also be thinking "I don't care about all
problems in the universe, and I sure don't expect to see all of them
with equal probability." Fair enough. So, can we make any statements,
assuming something other than a uniform distribution of problems?
For example, can we state that algorithm X is better on problem
class Y? If so, can we now quickly detect which problem class
our problem is in, and match an algorithm to that class?
I certainly do not pretend to know the answers to these questions,
but I thought some of you may be interested in thinking about these
issues.
Bill Spears@aic.nrl.navy.mil
------------------------------
From: wgm@santafe.edu (Bill Macready)
Date: Thu, 9 Feb 95 18:29:07 MST
Subject: Re: NFL/GA-List
Recently we announced a paper on "no-free-lunch" (NFL) theorems for
search. Concerning this topic William Spears says:
>>>
consider two search techniques: random search
and enumerative search. Then, over all possible problems I think
that one could state that (using a reasonable performance measure)
enumerative search has better expected performance, if all problems
are equally likely. The reason appears to be simple: random search
samples with replacement, while enumerative search does not.
Now, suppose we only consider stochastic search algorithms that
sample with replacement. Can we make any strong statements? For
example, will the addition of crossover help you as often as
hurt you? Is binary coding in any sense "better" than another coding?
>>>
Bill Spears is exactly right in his intuition that sampling with
replacement is inferior to sampling without replacement, if one
measures an algorithm's performance by the total number of queries it
has made of the fitness function (in addition, of course, to the
fitness of the population that is associated with those queries).
For this reason, we found that it makes sense to measure an
algorithm's performance by the number of distinct (!) queries it has
made of the fitness function. With such a measure, an algorithm won't
be penalized simply if it happens to resample a point it has already
visited.
In our paper, we adopt such a measure. The result is the following
"no-free-lunch" (NFL) theorem: Average over all fitness
functions. Then for any histogram of fitness values c, all search
algorithms have EXACTLY the same probability of generating a
population whose fitness values are described by c.
So in particular, if we are interested in maximizing fitness, we look
at the bin containing the highest fitness value. Since all algorithms
produce the same histogram they produce the SAME maximum fitness. If
one is interested (I don't know why) in the average fitness over time
produced by the algorithm then that too is identical for all
algorithms.
A concrete if somewhat surprising example:
I'd like to maximize a fitness function and I want to compare
hill-climbing to hill-descending. There are as many (loosely
speaking) fitness functions where hill-descending wins as there are
where hill-climbing wins!
So Bill Spears is right that crossover will help as readily as it
hurts. And binary coding is no better than any other coding.
On the other hand, there may indeed be minimax distinctions between
algorithms, and these distinctions may very well end up prefering
stochastic algorithms. To explain this, consider two search
algorithms, A and B. Say we are interested in average fitness across
populations generated by those algorithms. Then we know that according
to this measure, averaged over all fitness functions, A equals B - the
associated populations have the same average fitness. That's what the
no-free-lunch theorem says.
On the other hand, it may well be that for most fitness functions, A
beats B, by just a small amount. And for the few remaining fitness
functions, B beats A by a lot. (Hence the averages are the same.) Now
in this case *no matter what the fitness function*, we will never go
far wrong by using algorithm B, and sometimes we will have a big win
(in comparison to using A). However the same is not true for A. In
this sense, one could argue that B is minimax superior to A.
Little is currently known about how common minimax distinctions
between algorithms are. One can conceive of possible minimax
advantages a stochastic algorithm might have, but it's a completely
open question!
***
Bill Spears goes on to write:
>>>
Of course, the reader may also be thinking "I don't care about all
problems in the universe, and I sure don't expect to see all of them
with equal probability." Fair enough. So, can we make any statements,
assuming something other than a uniform distribution of problems?
For example, can we state that algorithm X is better on problem
class Y? If so, can we now quickly detect which problem class
our problem is in, and match an algorithm to that class?
>>>
Again, Bill Spears hits the nail on the head. One can always argue
``oh, well in the real world the probability across problems is not
uniform, so the NFL results do not apply, and therefore I'm okay in
using my favorite search algorithm''. However this premise does
not follow from the proposition. A uniform distribution across
problems is a *typical* distribution. (The uniform average over all
distributions results in the uniform distribution.) It is *not* in any
sense pathological. So the actual distribution might just as easily
be one for which your algorithm is poorly suited as one for which it
is well suited.
Ultimately, the only way to justify one's search algorithm is to argue
in favor of a particular distribution, and then argue that your
algorithm is well suited to that distribution. This is the only (!)
legitimate way of defending a particular search algorithm against the
implications of the NFL theorems.
One should *start* with known properties of the function to be
optimized, and from those properties, *derive* the optimal algorithm
for searching that function. This - the reverse of what is
conventional in the GA community - is the proper way to address search.
The approach of determining which fitness functions are easily searched
by a particular algorithm (say a GA) is only of interest in helping to
solve this more fundamental problem of how to *derive* good algorithms.
***
Nonetheless, it is clearly of interest to derive NFL-type results that
are independent of the distribution over problems. As an example of
such a result, say we fix the fitness function, and then run two
algorithms A and B on that function for N steps. At the end of this,
we examine the population each algorithm has generated, and based on
that choose to continue the search using either algorithm A or B.
The algorithm we use to choose between A and B is a "choosing
procedure". It turns out that *independent of the fitness function*,
if one averages over all A and B, then all choosing procedures are
equal.
So in particular, one has no formal justification for choosing an
algorithm that generated a fit population over one that did not!
***
In addition it's important to emphasize that the natural starting
point for any formal analysis of the search process has to start with
uniform distributions. In essence, the analysis for such distributions
provides the "skeleton" of the full-blown analysis; giving detailed
non-uniform information concerning the distributions at hand is
"fleshing out the skeleton" for the task at hand.
In addition, the skeleton allows you to do things like derive
mathematical benchmarks for an algorithm's performance. (Any algorithm
not beating such a benchmark is poorly suited to the problem at hand.)
It also leads one into deep and subtle information theoretic and
geometric aspects of the search problem.
All of this, and more, is touched on in our paper.
David Wolpert and Bill Macready
----------------------------------------------------------------------
Genetic Algorithms Digest Tuesday, February 28, 1995 Volume 9 : Issue 10
From: park@cs.bu.edu (Kihong Park)
Date: Thu, 16 Feb 1995 00:29:34 -0500
Subject: comments on no-free-lunch theorem
It seems that too much is being made of the no-free-lunch (NFL) theorem
(Wolpert & Macready), especially, as it pertains to its implications to
combinatorial optimization. In essence, the theorem has very little relevance
to combinatorial optimization, and care should be taken not to make misleading
claims.
For example:
"... if simulated annealing outperforms genetic algorithms on some set PHI,
genetic algorithms outperform simulated annealing on F \ PHI" (pp. 5 of their
SFI tech. report) where F is the set of all fitness functions.
"Again, Bill Spears hits the nail on the head. One can always argue
``oh, well in the real world the probability across problems is not
uniform, so the NFL results do not apply, and therefore I'm okay in
using my favorite search algorithm''. However this premise does
not follow from the proposition. A uniform distribution across
problems is a *typical* distribution. (The uniform average over all
distributions results in the uniform distribution.) It is *not* in any
sense pathological. So the actual distribution might just as easily
be one for which your algorithm is poorly suited as one for which it
is well suited." (Comment by Macready in GA-List v9n7.)
As Wolpert & Macready seem well aware, it is of paramount importance what
probability distribution over F (set of fitness functions) one considers when
deriving results. Their NFL theorem assumes the uniform distribution. Although
the uniform distribution is "typical" and "natural" to use in many contexts,
combinatorial optimization is not one.
However one sets up the optimization problem (inclusive the case W & M
consider, namely, the set of all fitness functions F from some finite set X to
some other finite set Y), "most" functions in F are "random" in a well-defined
sense, but more importantly, they tend to share a common property of either
being all very "difficult", or all very "easy".
Here randomness is w.r.t. Kolmogorov complexity wherein most functions in F
cannot be represented with fewer bits than would be required to encode the
function table. That is, typical functions are incompressible. In case of
MAX-CLIQUE, it is known that a typical instance (i.e., fitness function) with
n nodes has fitness value ~ 2(log n), but no polynomial algorithm (stochastic
or deterministic) is known which can find a close approximation. In other
problems (e.g., SAT), the opposite is true.
The point is that typical functions under the uniform distribution are
not very interesting for optimization. Furthermore, typical functions under
the uniform distribution will certainly never be encountered in practice, or
even when concocted. Since typical functions are incompressible, whatever
fitness function one will encounter in one's lifetime is bound to be of the
compressible kind. For instance, say that one maps an application problem into
SAT form. Usually, the number of Boolean variables will be < 20000. A "typical"
20000-variable Boolean function needs approximately 2^20000 bits to be
represented. We will never see one. The ones we encounter are, by definition,
atypical ones.
Even in "artificial" settings where one considers, say, "random" Boolean
functions, the random number generator is by definition a miniscule program
compared to 2^20000 (20000 variables is overkill - even with 300 variables
we get super-astronomical numbers), and the resultant function, although
called "random", is again the epitomy of atypicalness w.r.t. the uniform
distribution.
Hence any result in combinatorial optimization that appeals to the uniform
distribution over fitness functions is utterly useless. Almost all the
probability measure is contributed by random functions, objects provably of
0-significance to optimization. It is like trying to say something about
the property of rational numbers but formulating a theorem that has the reals
as its sample space. The reals swamp out the rationals, thus yielding useless
results since rationals have measure 0 in this probability space.
It is still a very much an open problem as to how one can characterize the
properties of compressible objects (in essence, the heart of the P = NP
question), and results such as the NFL theorem should be viewed in this context
.
All the claims about if algorithm A does better here then algorithm B must
do better there is at best misleading and serves no purpose.
I hope my comments were of some help in clarifying the issues.
Best regards, Kihong Park
------------------------------
From: Jim Van Zandt
Date: Fri, 17 Feb 1995 11:15:38 -0500
Subject: No Free Lunch Theorems for Search
D.H. Wolpert and W.G. Macready write:
We show that all algorithms that search for an extremum of a cost
function perform exactly the same, when averaged over all possible
cost functions.
I think this misses the point. Consider a fitness function
f(x), where x is a binary chromosome with 64 bits. However, suppose a
malevalent colleague modifies your source code so what your algorithm
really calculates is f(e(x)), where e(x) is a DES encryption of x. I
think you will agree that the NFL theorem applies to this case - all
optimization algorithms will have equal performance.
However, actual problems differ in this critical respect: Genotypes
and phenotypes are correlated (however loosely). It is this correlation
that permits an optimization algorithm to efficiently search for
an optimum. The simpler the form of correlation, the more efficient
the optimization algorithm which is permitted. I find GAs attractive
because they can exploit very general forms of correlation.
William M. Spears (spears@aic.nrl.navy.mil) writes:
This discussion prompted me to think about why it is that I rarely
modify my GA. Why don't I use uniform crossover, ranking, steady
state, etc.? I realized it is because I had an implicit
conservation law in my head saying "Well, making this modification
will help me on some problems, but hurt me on others, so why
bother?"
It depends on the problems you attack. Two-point crossover lets the GA
exploit correlation in bit placement along the chromosome (where
neighboring bits may "work together" to generate good or bad
performance). Uniform crossover would deny access to such correlation.
If bits are correlated for the encodings you use (e.g. Gray coded
integers) then uniform crossover would be a bad idea.
The NFL theorems tell us that we can't hope to find an optimization
algorithm that will work for "all cost functions". I conclude that we
must try to distinguish the particular functions we optimize by
choosing encodings and genetic operators that will correlate the
phenotypes and genotypes in a way that the GA can exploit.
- Jim Van Zandt
------------------------------
From: Tuan.Pham@unsw.EDU.AU
Date: Tue, 21 Feb 1995 12:37:57 +1000
Subject: No free lunch Theorem
Re: No free lunch theorem
If I understand correctly, the NFL theorem says that for
every function that favours some search rule, we can devise
one that doesn't, simply by putting the optimum where it is
least expected.
For example, the follow function (f1) can be maximized
by the rule "always go up the slope":
f1 |
| *
| * *
| * *
| * *
| *
|___________________
x
but the rule will fail for f2:
f2 | *
| *
| * *
| * *
| * *
| *
|___________________
x
In practice however, functions of type f1 will be overwhelmingly
more frequent than those of type f2 (and even if a function
like f2 existed we would probably not be interested in
finding the peak at the right, for reasons of stability).
We have seen all kinds of clever test functions being devised
to "stump" search methods - e.g. Shekel's foxholes, Griewangk, Schwefel,
etc. but the truth is, real useful functions look nowhere like that!
So such efforts are probably of academic interest only
and more meaningful research should concentrate on characterising
"real world" functions.
Would the authors of NFL care to comment?
Tuan Pham
Chemical Engineering Dept
University of New South Wales.
------------------------------
From: wgm@santafe.edu (Bill Macready)
Date: Thu, 2 Mar 95 10:15:57 MST
Subject: nfl reply
We would like to make some points in response to the recent discussion
of our no free lunch results for search.
1) First, we want to emphasize that the the NFL theorem and its
immediate implications for search are only a relatively small fraction
of our paper. We use it as a *starting point*, for theoretical
investigations never before attempted: investigations of the intrinsic
nature of the search problem.
Many of those subsequent investigations have direct applicability
to real-world problems. For example, some of those investigations
carried out in our paper present several different benchmark measures
for how well a search algorithm is doing. Any claims that an
algorithm does well in some "absolute" sense for a particular problem
that do not compare the algorithm's performance to those (or similar)
benchmarks are, quite simply, without merit.
It should also be noted that some of these "subsequent
investigations" hold for any and all distributions P(f), in which case
the whole issue of whether one should take to heart implications of
the uniform P(f) case are rendered mute. For example, we show that for
any f, there is no formal justification for choosing between two
search algorithms based solely (!) on which has performed best so
far. (Formally, if one averages over all pairs of algorithms, any
procedure for choosing between two algorithms based on performance so
far is just as good as any other, regardless of f.)
In other words, when choosing between algorithms based on their
observed performance it does not suffice to make an assumption about
f; some (currently poorly understood) assumption is also being made
about how the algorithms in question are related to each other and to
f.
2) Nonetheless, in spite of this P(f) independent result, the common
response to the NFL results in the recent GA list postings is that
P(f) is not uniform in practice, and "therefore" the results do not
apply in practice. In fact, whether P(f) is uniform in the real world
is completely irrelevent to the issue at hand. The point of the NFL
results is that it is just as possible that the actual P(f) is *worse*
than uniform, as far as your algorithm is concerned, as it is that it
is better than uniform.
Simply assuming P(f) isn't uniform can't justify an algorithm. You
must instead make the *much* bigger assumption that it doesn't fall
into the half of the space of all P(f) in which your algorithm does
worse than the random search algorithm. AND WITHOUT FORMALLY
JUSTIFYING THAT ASSUMPTION, YOU HAVE NOT JUSTIFIED YOUR ALGORITHM.
3) It should also be pointed out that although many people may *say*
"sure, I know my favorite algorithm's not the best thing since sliced
bread", they *act* as though it were. Their actions betray them. In
practice people do NOT go from knowledge concerning f to a search
algorithm, even if they verbally agree that that is the correct
procedure.
We can not help but note that there is a very strong correlation
between who-the-researcher-is and what-search-algorithm-is-used. Many
researchers almost always use the same algorithm, *regardless of f or
their knowledge concerning f*. At most, they will "tweak" their
algorithm in hopes of improving it for the f at hand. However there is
ZERO reason not to suspect that for the f at hand it is better to use
a completely different algorithm rather than to spend endless hours
tweaking your favorite pre-fixed one.
For those who don't make their living doing optimization there is
certainly nothing wrong with using a canned GA or SA package. But for
those who make optimization their business (as many who read this list
do) we are simply pointing out that it is foolish to be wed to only
one optimization tool. The NFL results formally prove this. Consider
the impressive progress on problems like TSP and linear programming
where research has proceeded from the cost function to suitable search
algorithms. Compare that progress to research which goes from
algorithms (eg GA's) to hopefully suitable problems.
*******
Kihong Park writes:
Park> care should be taken not to make misleading claims.
We hope that Dr. Park takes his own advice to heart. First of all,
issues of Kolmogorov complexity, with which Dr. Park seems quite
concerned, are in fact irrelevent to all issues concerning search in
finite spaces. Kolmogorov complexity involves Turing machines, which
rely crucially (!!) on the uncountability of their tape. In finite
spaces, there simply is no such uncoutability, and therefore
considerations concerning Turing machines are irrelevent. We are in a
finite automata regime, loosely speaking.
Park> For instance, say that one maps an application problem into
Park> SAT form. Usually, the number of Boolean variables will be <
Park> 20000. A "typical" 20000-variable Boolean function needs
Park> approximately 2^20000 bits to be represented. We will never see
Park> one. The ones we encounter are, by definition, atypical ones.
Finding a Hamiltonian path in a graph of n vertices requires n^2
Boolean variables when mapped to SAT. 20000 Boolean variables then
corresponds to graphs with only 140 vertices. Can you not conceive of
asking for Hamiltonian paths on graphs with 140 vertices?
We would like to understand how Dr. Park can justify the second
statement? It obviously can not be proven, since any such proof would
have to rely on some ad hoc (i.e., unjustified) choice of a
representation scheme. And it is not at all clear that the statement
is true in the real world. Is Dr. Park completely unaware of
statistical selection effects, whereby people (often unconsciously)
pre-select problems so that their system works well on those problems,
and ignore all other problems? How can he be so sure that those
ignored problems are not one of those that he claims "we will never
see"? Is he really so willing to ignore the well-known lessons of
statistical psychology?
Park> Even in "artificial" settings where one considers, say, "random"
Park> Boolean functions, the random number generator is by definition
Park> a miniscule program compared to 2^20000 (20000 variables is
Park> overkill even with 300 variables we get super-astronomical
Park> numbers), and the resultant function, although called "random",
Park> is again the epitomy of atypicalness w.r.t. the uniform
Park> distribution.
Again, this is irrelevent. WHAT IS IMPORTANT IS WHETHER THE
"ATYPICALNESS" OF YOUR PROBLEM MATCHES THE ALGORITHM THAT YOU USE. How
many ways can we say this? The *central* pedagogical point of the nfl
theorem is that you could just as easily have the atypicalness of the
problem at hand "anti-match" your algorithm. Does Dr. Park really
claim that, for example, GA's will do better than random on problems
generated by random number generators, in some sort of "contradiction"
to our central thesis??? We find such a claim ludicrous.
Park> Almost all the probability measure is contributed by random
Park> functions, objects provably of > 0-significance to optimization
We would like to know just exactly what Dr. Park means by "provably". We
use math to "prove" our results. What Dr. Park is claiming here simply
can not be "proven" in that sense. Full stop.
********
Jim Van Zandt writes
Zandt> Consider a fitness function f(x), where x is a binary
Zandt> chromosome with 64 bits. However, suppose a malevalent colleague
Zandt> modifies your source code so what your algorithm really
Zandt> calculates is f(e(x)), where e(x) is a DES encryption of x. I
Zandt> think you will agree that the NFL theorem applies to this case
Zandt> - all optimization algorithms will have equal performance.
No they won't! This illustrates a point of confusion. The algorithm
which first decrypts x and then optimizes intelligently will do much
better. This is precisely the point, use knowledge of f to design the
best algorithm!!!
Zandt> I find GAs attractive because they can exploit very general
Zandt> forms of correlation.
*Any* scheme that tries to "exploit very general forms of correlation"
will die a gruesome death chasing its own tail just as readily as it
will succeed. That is what we prove (!!). "Exploiting general forms of
correlation" is NOT an unalloyed blessing. That's the whole point of
the NFL results. If the correlations you are "exploiting" are not
present in f then at best (!) you can hope to perform as well as a
random algorithm. And since you don't know precisely what correlations
are exploited by a GA you run the risk they are not present in f. If
you know nothing of f the NFL therem states that since they're as
likely to not be present as they are likely to be present, you're
running a large risk.
Zandt> I conclude that we must try to distinguish the particular
Zandt> functions we optimize by choosing encodings and genetic
Zandt> operators that will correlate the phenotypes and genotypes in
Zandt> a way that the GA can exploit.
Sure. But better yet, don't be so wed to a GA. See above.
*******
Tuan.Pham@unsw.EDU.AU writes
Pham> If I understand correctly, the NFL theorem says that for
Pham> every function that favours some search rule, we can devise
Pham> one that doesn't, simply by putting the optimum where it is
Pham> least expected.
Pham>
Pham> For example, the follow function (f1) can be maximized
Pham> by the rule "always go up the slope":
Pham>
Pham> f1 |
Pham> | *
Pham> | * *
Pham> | * *
Pham> | * *
Pham> | *
Pham> |___________________
Pham> x
Pham>
Pham> but the rule will fail for f2:
Pham>
Pham> f2 | *
Pham> | *
Pham> | * *
Pham> | * *
Pham> | * *
Pham> | *
Pham> |___________________
Pham> x
Although this is true, our result is far stronger. Note that a hill
climber will do well for both f2 and f1, even if it's not optimal for
f2. We're saying even this (the hill-climber doing well) does not hold in
general. There are f's for which the hill-climber will do *worse than
random search*. And there are (very roughly speaking) just as many of
those f's as there are f's like f1 and f2, for which the algorithm does
better than random search.
In fact, our result isn't even restricted to seeing how good the
best-point-searched-so-far is, as is implicit in Dr. Pham's message. For
*any* functional of the set of all points searched-so-far, any two
algorithms are equal, on average.
Pham> In practice however, functions of type f1 will be overwhelmingly
Pham> more frequent than those of type f2
HOW DO YOU KNOW THIS?
Part of the large reason for our emphasizing the NFL theorem (rather than
our subsequent results in our paper) is that people are so prone to making
these kinds of unsubstantiated statements.
Pham> We have seen all kinds of clever test functions being devised
Pham> to "stump" search methods - e.g. Shekel's foxholes, Griewangk,
Pham> Schwefel, etc. but the truth is, real useful functions look nowhere
Pham> like that!
But you don't have to be "clever" - be dumb! It is hard *not* to create
test functions that "stump" search methods.
And again, how do you know "real useful functions look nowhere like
that"?
REMEMBER, EVEN IF THE FITNESS FUNCTION *IS* RANDOM, A SEARCH ALGORITHM
WILL IMPROVE THE FITNESS OF THE MEMBERS OF ITS GENERATIONS THE LONGER
IT SEARCHES. So the *obvious* question is whether the improved
performance of our favorite algorithms is better on real-world
problems than the improvement accompanying a random algorithm.
The only (!) way to answer this question is with the benchmarks we
provide (or similar ones). We are currently conducting empirical
investigations of this issue.
Pham> research should concentrate on characterising "real world" functions.
We couldn't agree more. But that's only the first half of the program. The
second is to derive a calculus by which "characteristics" of the function
at hand are fed in and the optimal (!) search algorithm for those
characteristics is then produced.
Bill Macready and David Wolpert
------------------------------
From: spears@AIC.NRL.Navy.Mil (William M. Spears)
Date: Tue, 7 Mar 95 13:55:04 EST
Subject: NFL thoughts...
Greetings all,
I will not attempt to address the discussions about
uniform distributions. However, I think David and Bill
have raised some valid questions about how SOME (not all)
research is done in the GA community. I will try to
summarize some of MY interpretation below:
(1) One should be careful to not simply state that an algorithm
is better than another, without talking about a distribution over
fitness functions.
Upon reflection I can remember numerous occassions when
someone has said to make such-and-such change (steady-state,
uniform crossover, real-valued chromosomes, higher mutation rates,
adaptive mutation rates, elitism, gray codes, etc.) to my GA, because
it will work better (period). No mention is made of which functions
it will work better on. Also, I've started to go back over
proceedings and find that some papers have had the following
format: GAs don't do too well on De Jong's test suite - make a
change to the GA that improves it on that test suite - conclude
one should make that change to the GA in general.
In short, this is something we probably shouldn't do. What
should we do?
(2) If we want to make general changes to the GA (or invent new
algorithms out of the blue) we need to attempt to answer the
following questions (at least):
a) What problems will this algorithm work well with?
b) What problems will this algorithm work poorly with?
c) How likely is it that we'll see these problems (i.e. do we
have some a priori information?).
d) Is there a way to examine a problem and detect whether a
given algorithm will work well or poorly on it?
Number (2) implies that we need to develop problem generators,
where problems can METHODICALLY be changed (as opposed to the
rather random assortment of problems we often use). For example,
my work with boolean satisfiability can allow us to use the random
problem generators from other communities (Steiner systems, graph
problems, max-clique problems, etc...) to create problems for our
GAs. Simply sticking with Random 3-SAT problems, we can easily
create problems where the size of the problem and amount of epistatis
is modified in a controlled fashion. I will not go into details here,
because it isn't my intention to push only my particular ideas (I'll
be happy to interact via email with anyone who is interested). My
intention is to suggest we start working on creating meaningful
problem generators, to help us understand the biases of our algorithms.
Cheers, Bill
------------------------------
Genetic Algorithms Digest Thursday, March 16, 1995 Volume 9 : Issue 14
From: J.Kingdon@cs.ucl.ac.uk
Date: Fri, 10 Mar 95 15:37:42 +0000
Subject: NFL (Re: v9n12)
NFL etc..
I agree with most of the points made by Bill and David, however there are
a couple of points -
Bill Macready and David Wolpert - write:
M&W> 1) First, we want to emphasize that the the NFL theorem and its
M&W> immediate implications for search are only a relatively small fraction
M&W> of our paper. We use it as a *starting point*, for theoretical
M&W> investigations never before attempted: investigations of the intrinsic
M&W> nature of the search problem.
I disagree here - you just haven't spoken to the right people. For a
while now we have been working on the basis that all search algorithm's
make a trade between the time they take to find a solution and the number
of assumptions they make about the solution they are trying to find. So for
example, a random search, and an enumerative search, make no assumptions about
the space they are searching and pay in long running times. Conversely,
the X=1 search algorithm makes a massive assumption about not only the search
space, but the fitness problem. It runs extremely fast, but the quality
of its solution is usually very bad (except by coincidence). In between
these two extremes is the broad space of search algorithms that most of
us use (GAs, SA, EAs, Hillclimbing etc. etc.). But again all of these
algorithms make assumptions, and are therefore biased when confronted
with an arbitrary fitness landscape. In this context deceptive problems
(problems designed to mislead a search heuristic) are algorithm
dependent. To fix this point, note that for a binary 1-bit hillclimber,
the modality (number of peaks) of the fitness landscape is dependent on
the binary encoding of the search space (not that unimodality is a guarentee
of good convergence - see Horn) In this sense search problems are more like
*lock and key* problems, in which ideally you want to use a search algorithm
that is making the right kind of assumptions about the landscape it is
searching (i.e., the right kind of key for a given lock). Moreover, on
this basis for a real world optimisation problem the best strategy (as
measured by some minimax criteria - see below) is multiple search heuristics,
on the chance that one pays off.
Dr Pham writes:
>> Pham> research should concentrate on characterising "real world" functions.
M&W> We couldn't agree more. But that's only the first half of the program. The
M&W> second is to derive a calculus by which "characteristics" of the function
M&W> at hand are fed in and the optimal (!) search algorithm for those
M&W> characteristics is then produced.
This seems very optimistic, and seems to contradict your own theorem!
Even if such a calculus existed, what it would imply is that on the basis
of an M sample from the search space the calculus would deliver the
best means (possibly algorithm) for determining the next point you should
sample. This algorithm is already included in NFL. Therefore, as you
point out (in your paper), we are left analysing the minimax properties of
a given search algorithm, which again as you point out, is extremely open.
Moreover, what results are achieved by a minimax criteria are dependent on
what minimax criteria are layed down.
Jason Kingdon
------------------------------