Genetic Algorithms Digest Tuesday, February 14, 1995 Volume 9 : Issue 7 - Do NOT send email or reply to gadistr@aic.nrl.navy.mil (GA List Moderator) - Send submissions (articles) to GA-List@AIC.NRL.NAVY.MIL - Send administrative (subscribe, unsubscribe, change of address, etc.,) requests to GA-List-Request@AIC.NRL.NAVY.MIL ****************************************************************************** - You can access back issues, GA code, conference announcements, etc., either through the WWW at URL http://www.aic.nrl.navy.mil/galist/ or through anonymous ftp at ftp.aic.nrl.navy.mil [192.26.18.68] in /pub/galist. ****************************************************************************** From: wgm@santafe.edu (Bill Macready) Date: Tue, 7 Feb 95 19:02:44 MST Subject: paper available - No Free Lunch Theorems for Search No Free Lunch Theorems for Search D.H. Wolpert, W.G. Macready We show that all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. In particular, if algorithm A outperforms algorithm B on some cost functions, then loosely speaking there must exist exactly as many other functions where B outperforms A. Starting from this we analyze a number of the other a priori characteristics of the search problem, like its geometry and its information-theoretic aspects. This analysis allows us to derive mathematical benchmarks for assessing a particular search algorithm's performance. We also investigate minimax aspects of the search problem, the validity of using characteristics of a partial search over a cost function to predict future behavior of the search algorithm on that cost function, and time-varying cost functions. We conclude with some discussion of the justifiability of biologically-inspired search methods. (32 pages) To obtain an electronic copy of this paper: ftp ftp.santafe.edu login: anonymous password: cd /pub/wgm get nfl.ps quit Then at your system: lpr -P If you have trouble getting any of these papers electronically, you can request a hard copy from Deborah Smith (drs@santafe.edu), Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, USA, 87501. Bill Macready wgm@santafe.edu ----------------------------------------------------------------------------- From: William M. Spears (spears@aic.nrl.navy.mil) Date: Wed, 17 Aug 1994 (Excerpt from my August message) Subject: Conservation Law for search techniques This discussion prompted me to think about why it is that I rarely modify my GA. Why don't I use uniform crossover, ranking, steady state, etc.? I realized it is because I had an implicit conservation law in my head saying "Well, making this modification will help me on some problems, but hurt me on others, so why bother?" So, I started wondering if my implicit assumption were true. For example, suppose I consider two search techniques: random search and enumerative search. Then, over all possible problems I think that one could state that (using a reasonable performance measure) enumerative search has better expected performance, if all problems are equally likely. The reason appears to be simple: random search samples with replacement, while enumerative search does not. Now, suppose we only consider stochastic search algorithms that sample with replacement. Can we make any strong statements? For example, will the addition of crossover help you as often as hurt you? Is binary coding in any sense "better" than another coding? Of course, the reader may also be thinking "I don't care about all problems in the universe, and I sure don't expect to see all of them with equal probability." Fair enough. So, can we make any statements, assuming something other than a uniform distribution of problems? For example, can we state that algorithm X is better on problem class Y? If so, can we now quickly detect which problem class our problem is in, and match an algorithm to that class? I certainly do not pretend to know the answers to these questions, but I thought some of you may be interested in thinking about these issues. Bill Spears@aic.nrl.navy.mil ------------------------------ From: wgm@santafe.edu (Bill Macready) Date: Thu, 9 Feb 95 18:29:07 MST Subject: Re: NFL/GA-List Recently we announced a paper on "no-free-lunch" (NFL) theorems for search. Concerning this topic William Spears says: >>> consider two search techniques: random search and enumerative search. Then, over all possible problems I think that one could state that (using a reasonable performance measure) enumerative search has better expected performance, if all problems are equally likely. The reason appears to be simple: random search samples with replacement, while enumerative search does not. Now, suppose we only consider stochastic search algorithms that sample with replacement. Can we make any strong statements? For example, will the addition of crossover help you as often as hurt you? Is binary coding in any sense "better" than another coding? >>> Bill Spears is exactly right in his intuition that sampling with replacement is inferior to sampling without replacement, if one measures an algorithm's performance by the total number of queries it has made of the fitness function (in addition, of course, to the fitness of the population that is associated with those queries). For this reason, we found that it makes sense to measure an algorithm's performance by the number of distinct (!) queries it has made of the fitness function. With such a measure, an algorithm won't be penalized simply if it happens to resample a point it has already visited. In our paper, we adopt such a measure. The result is the following "no-free-lunch" (NFL) theorem: Average over all fitness functions. Then for any histogram of fitness values c, all search algorithms have EXACTLY the same probability of generating a population whose fitness values are described by c. So in particular, if we are interested in maximizing fitness, we look at the bin containing the highest fitness value. Since all algorithms produce the same histogram they produce the SAME maximum fitness. If one is interested (I don't know why) in the average fitness over time produced by the algorithm then that too is identical for all algorithms. A concrete if somewhat surprising example: I'd like to maximize a fitness function and I want to compare hill-climbing to hill-descending. There are as many (loosely speaking) fitness functions where hill-descending wins as there are where hill-climbing wins! So Bill Spears is right that crossover will help as readily as it hurts. And binary coding is no better than any other coding. On the other hand, there may indeed be minimax distinctions between algorithms, and these distinctions may very well end up prefering stochastic algorithms. To explain this, consider two search algorithms, A and B. Say we are interested in average fitness across populations generated by those algorithms. Then we know that according to this measure, averaged over all fitness functions, A equals B - the associated populations have the same average fitness. That's what the no-free-lunch theorem says. On the other hand, it may well be that for most fitness functions, A beats B, by just a small amount. And for the few remaining fitness functions, B beats A by a lot. (Hence the averages are the same.) Now in this case *no matter what the fitness function*, we will never go far wrong by using algorithm B, and sometimes we will have a big win (in comparison to using A). However the same is not true for A. In this sense, one could argue that B is minimax superior to A. Little is currently known about how common minimax distinctions between algorithms are. One can conceive of possible minimax advantages a stochastic algorithm might have, but it's a completely open question! *** Bill Spears goes on to write: >>> Of course, the reader may also be thinking "I don't care about all problems in the universe, and I sure don't expect to see all of them with equal probability." Fair enough. So, can we make any statements, assuming something other than a uniform distribution of problems? For example, can we state that algorithm X is better on problem class Y? If so, can we now quickly detect which problem class our problem is in, and match an algorithm to that class? >>> Again, Bill Spears hits the nail on the head. One can always argue ``oh, well in the real world the probability across problems is not uniform, so the NFL results do not apply, and therefore I'm okay in using my favorite search algorithm''. However this premise does not follow from the proposition. A uniform distribution across problems is a *typical* distribution. (The uniform average over all distributions results in the uniform distribution.) It is *not* in any sense pathological. So the actual distribution might just as easily be one for which your algorithm is poorly suited as one for which it is well suited. Ultimately, the only way to justify one's search algorithm is to argue in favor of a particular distribution, and then argue that your algorithm is well suited to that distribution. This is the only (!) legitimate way of defending a particular search algorithm against the implications of the NFL theorems. One should *start* with known properties of the function to be optimized, and from those properties, *derive* the optimal algorithm for searching that function. This - the reverse of what is conventional in the GA community - is the proper way to address search. The approach of determining which fitness functions are easily searched by a particular algorithm (say a GA) is only of interest in helping to solve this more fundamental problem of how to *derive* good algorithms. *** Nonetheless, it is clearly of interest to derive NFL-type results that are independent of the distribution over problems. As an example of such a result, say we fix the fitness function, and then run two algorithms A and B on that function for N steps. At the end of this, we examine the population each algorithm has generated, and based on that choose to continue the search using either algorithm A or B. The algorithm we use to choose between A and B is a "choosing procedure". It turns out that *independent of the fitness function*, if one averages over all A and B, then all choosing procedures are equal. So in particular, one has no formal justification for choosing an algorithm that generated a fit population over one that did not! *** In addition it's important to emphasize that the natural starting point for any formal analysis of the search process has to start with uniform distributions. In essence, the analysis for such distributions provides the "skeleton" of the full-blown analysis; giving detailed non-uniform information concerning the distributions at hand is "fleshing out the skeleton" for the task at hand. In addition, the skeleton allows you to do things like derive mathematical benchmarks for an algorithm's performance. (Any algorithm not beating such a benchmark is poorly suited to the problem at hand.) It also leads one into deep and subtle information theoretic and geometric aspects of the search problem. All of this, and more, is touched on in our paper. David Wolpert and Bill Macready ---------------------------------------------------------------------- Genetic Algorithms Digest Tuesday, February 28, 1995 Volume 9 : Issue 10 From: park@cs.bu.edu (Kihong Park) Date: Thu, 16 Feb 1995 00:29:34 -0500 Subject: comments on no-free-lunch theorem It seems that too much is being made of the no-free-lunch (NFL) theorem (Wolpert & Macready), especially, as it pertains to its implications to combinatorial optimization. In essence, the theorem has very little relevance to combinatorial optimization, and care should be taken not to make misleading claims. For example: "... if simulated annealing outperforms genetic algorithms on some set PHI, genetic algorithms outperform simulated annealing on F \ PHI" (pp. 5 of their SFI tech. report) where F is the set of all fitness functions. "Again, Bill Spears hits the nail on the head. One can always argue ``oh, well in the real world the probability across problems is not uniform, so the NFL results do not apply, and therefore I'm okay in using my favorite search algorithm''. However this premise does not follow from the proposition. A uniform distribution across problems is a *typical* distribution. (The uniform average over all distributions results in the uniform distribution.) It is *not* in any sense pathological. So the actual distribution might just as easily be one for which your algorithm is poorly suited as one for which it is well suited." (Comment by Macready in GA-List v9n7.) As Wolpert & Macready seem well aware, it is of paramount importance what probability distribution over F (set of fitness functions) one considers when deriving results. Their NFL theorem assumes the uniform distribution. Although the uniform distribution is "typical" and "natural" to use in many contexts, combinatorial optimization is not one. However one sets up the optimization problem (inclusive the case W & M consider, namely, the set of all fitness functions F from some finite set X to some other finite set Y), "most" functions in F are "random" in a well-defined sense, but more importantly, they tend to share a common property of either being all very "difficult", or all very "easy". Here randomness is w.r.t. Kolmogorov complexity wherein most functions in F cannot be represented with fewer bits than would be required to encode the function table. That is, typical functions are incompressible. In case of MAX-CLIQUE, it is known that a typical instance (i.e., fitness function) with n nodes has fitness value ~ 2(log n), but no polynomial algorithm (stochastic or deterministic) is known which can find a close approximation. In other problems (e.g., SAT), the opposite is true. The point is that typical functions under the uniform distribution are not very interesting for optimization. Furthermore, typical functions under the uniform distribution will certainly never be encountered in practice, or even when concocted. Since typical functions are incompressible, whatever fitness function one will encounter in one's lifetime is bound to be of the compressible kind. For instance, say that one maps an application problem into SAT form. Usually, the number of Boolean variables will be < 20000. A "typical" 20000-variable Boolean function needs approximately 2^20000 bits to be represented. We will never see one. The ones we encounter are, by definition, atypical ones. Even in "artificial" settings where one considers, say, "random" Boolean functions, the random number generator is by definition a miniscule program compared to 2^20000 (20000 variables is overkill - even with 300 variables we get super-astronomical numbers), and the resultant function, although called "random", is again the epitomy of atypicalness w.r.t. the uniform distribution. Hence any result in combinatorial optimization that appeals to the uniform distribution over fitness functions is utterly useless. Almost all the probability measure is contributed by random functions, objects provably of 0-significance to optimization. It is like trying to say something about the property of rational numbers but formulating a theorem that has the reals as its sample space. The reals swamp out the rationals, thus yielding useless results since rationals have measure 0 in this probability space. It is still a very much an open problem as to how one can characterize the properties of compressible objects (in essence, the heart of the P = NP question), and results such as the NFL theorem should be viewed in this context . All the claims about if algorithm A does better here then algorithm B must do better there is at best misleading and serves no purpose. I hope my comments were of some help in clarifying the issues. Best regards, Kihong Park ------------------------------ From: Jim Van Zandt Date: Fri, 17 Feb 1995 11:15:38 -0500 Subject: No Free Lunch Theorems for Search D.H. Wolpert and W.G. Macready write: We show that all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. I think this misses the point. Consider a fitness function f(x), where x is a binary chromosome with 64 bits. However, suppose a malevalent colleague modifies your source code so what your algorithm really calculates is f(e(x)), where e(x) is a DES encryption of x. I think you will agree that the NFL theorem applies to this case - all optimization algorithms will have equal performance. However, actual problems differ in this critical respect: Genotypes and phenotypes are correlated (however loosely). It is this correlation that permits an optimization algorithm to efficiently search for an optimum. The simpler the form of correlation, the more efficient the optimization algorithm which is permitted. I find GAs attractive because they can exploit very general forms of correlation. William M. Spears (spears@aic.nrl.navy.mil) writes: This discussion prompted me to think about why it is that I rarely modify my GA. Why don't I use uniform crossover, ranking, steady state, etc.? I realized it is because I had an implicit conservation law in my head saying "Well, making this modification will help me on some problems, but hurt me on others, so why bother?" It depends on the problems you attack. Two-point crossover lets the GA exploit correlation in bit placement along the chromosome (where neighboring bits may "work together" to generate good or bad performance). Uniform crossover would deny access to such correlation. If bits are correlated for the encodings you use (e.g. Gray coded integers) then uniform crossover would be a bad idea. The NFL theorems tell us that we can't hope to find an optimization algorithm that will work for "all cost functions". I conclude that we must try to distinguish the particular functions we optimize by choosing encodings and genetic operators that will correlate the phenotypes and genotypes in a way that the GA can exploit. - Jim Van Zandt ------------------------------ From: Tuan.Pham@unsw.EDU.AU Date: Tue, 21 Feb 1995 12:37:57 +1000 Subject: No free lunch Theorem Re: No free lunch theorem If I understand correctly, the NFL theorem says that for every function that favours some search rule, we can devise one that doesn't, simply by putting the optimum where it is least expected. For example, the follow function (f1) can be maximized by the rule "always go up the slope": f1 | | * | * * | * * | * * | * |___________________ x but the rule will fail for f2: f2 | * | * | * * | * * | * * | * |___________________ x In practice however, functions of type f1 will be overwhelmingly more frequent than those of type f2 (and even if a function like f2 existed we would probably not be interested in finding the peak at the right, for reasons of stability). We have seen all kinds of clever test functions being devised to "stump" search methods - e.g. Shekel's foxholes, Griewangk, Schwefel, etc. but the truth is, real useful functions look nowhere like that! So such efforts are probably of academic interest only and more meaningful research should concentrate on characterising "real world" functions. Would the authors of NFL care to comment? Tuan Pham Chemical Engineering Dept University of New South Wales. ------------------------------ From: wgm@santafe.edu (Bill Macready) Date: Thu, 2 Mar 95 10:15:57 MST Subject: nfl reply We would like to make some points in response to the recent discussion of our no free lunch results for search. 1) First, we want to emphasize that the the NFL theorem and its immediate implications for search are only a relatively small fraction of our paper. We use it as a *starting point*, for theoretical investigations never before attempted: investigations of the intrinsic nature of the search problem. Many of those subsequent investigations have direct applicability to real-world problems. For example, some of those investigations carried out in our paper present several different benchmark measures for how well a search algorithm is doing. Any claims that an algorithm does well in some "absolute" sense for a particular problem that do not compare the algorithm's performance to those (or similar) benchmarks are, quite simply, without merit. It should also be noted that some of these "subsequent investigations" hold for any and all distributions P(f), in which case the whole issue of whether one should take to heart implications of the uniform P(f) case are rendered mute. For example, we show that for any f, there is no formal justification for choosing between two search algorithms based solely (!) on which has performed best so far. (Formally, if one averages over all pairs of algorithms, any procedure for choosing between two algorithms based on performance so far is just as good as any other, regardless of f.) In other words, when choosing between algorithms based on their observed performance it does not suffice to make an assumption about f; some (currently poorly understood) assumption is also being made about how the algorithms in question are related to each other and to f. 2) Nonetheless, in spite of this P(f) independent result, the common response to the NFL results in the recent GA list postings is that P(f) is not uniform in practice, and "therefore" the results do not apply in practice. In fact, whether P(f) is uniform in the real world is completely irrelevent to the issue at hand. The point of the NFL results is that it is just as possible that the actual P(f) is *worse* than uniform, as far as your algorithm is concerned, as it is that it is better than uniform. Simply assuming P(f) isn't uniform can't justify an algorithm. You must instead make the *much* bigger assumption that it doesn't fall into the half of the space of all P(f) in which your algorithm does worse than the random search algorithm. AND WITHOUT FORMALLY JUSTIFYING THAT ASSUMPTION, YOU HAVE NOT JUSTIFIED YOUR ALGORITHM. 3) It should also be pointed out that although many people may *say* "sure, I know my favorite algorithm's not the best thing since sliced bread", they *act* as though it were. Their actions betray them. In practice people do NOT go from knowledge concerning f to a search algorithm, even if they verbally agree that that is the correct procedure. We can not help but note that there is a very strong correlation between who-the-researcher-is and what-search-algorithm-is-used. Many researchers almost always use the same algorithm, *regardless of f or their knowledge concerning f*. At most, they will "tweak" their algorithm in hopes of improving it for the f at hand. However there is ZERO reason not to suspect that for the f at hand it is better to use a completely different algorithm rather than to spend endless hours tweaking your favorite pre-fixed one. For those who don't make their living doing optimization there is certainly nothing wrong with using a canned GA or SA package. But for those who make optimization their business (as many who read this list do) we are simply pointing out that it is foolish to be wed to only one optimization tool. The NFL results formally prove this. Consider the impressive progress on problems like TSP and linear programming where research has proceeded from the cost function to suitable search algorithms. Compare that progress to research which goes from algorithms (eg GA's) to hopefully suitable problems. ******* Kihong Park writes: Park> care should be taken not to make misleading claims. We hope that Dr. Park takes his own advice to heart. First of all, issues of Kolmogorov complexity, with which Dr. Park seems quite concerned, are in fact irrelevent to all issues concerning search in finite spaces. Kolmogorov complexity involves Turing machines, which rely crucially (!!) on the uncountability of their tape. In finite spaces, there simply is no such uncoutability, and therefore considerations concerning Turing machines are irrelevent. We are in a finite automata regime, loosely speaking. Park> For instance, say that one maps an application problem into Park> SAT form. Usually, the number of Boolean variables will be < Park> 20000. A "typical" 20000-variable Boolean function needs Park> approximately 2^20000 bits to be represented. We will never see Park> one. The ones we encounter are, by definition, atypical ones. Finding a Hamiltonian path in a graph of n vertices requires n^2 Boolean variables when mapped to SAT. 20000 Boolean variables then corresponds to graphs with only 140 vertices. Can you not conceive of asking for Hamiltonian paths on graphs with 140 vertices? We would like to understand how Dr. Park can justify the second statement? It obviously can not be proven, since any such proof would have to rely on some ad hoc (i.e., unjustified) choice of a representation scheme. And it is not at all clear that the statement is true in the real world. Is Dr. Park completely unaware of statistical selection effects, whereby people (often unconsciously) pre-select problems so that their system works well on those problems, and ignore all other problems? How can he be so sure that those ignored problems are not one of those that he claims "we will never see"? Is he really so willing to ignore the well-known lessons of statistical psychology? Park> Even in "artificial" settings where one considers, say, "random" Park> Boolean functions, the random number generator is by definition Park> a miniscule program compared to 2^20000 (20000 variables is Park> overkill even with 300 variables we get super-astronomical Park> numbers), and the resultant function, although called "random", Park> is again the epitomy of atypicalness w.r.t. the uniform Park> distribution. Again, this is irrelevent. WHAT IS IMPORTANT IS WHETHER THE "ATYPICALNESS" OF YOUR PROBLEM MATCHES THE ALGORITHM THAT YOU USE. How many ways can we say this? The *central* pedagogical point of the nfl theorem is that you could just as easily have the atypicalness of the problem at hand "anti-match" your algorithm. Does Dr. Park really claim that, for example, GA's will do better than random on problems generated by random number generators, in some sort of "contradiction" to our central thesis??? We find such a claim ludicrous. Park> Almost all the probability measure is contributed by random Park> functions, objects provably of > 0-significance to optimization We would like to know just exactly what Dr. Park means by "provably". We use math to "prove" our results. What Dr. Park is claiming here simply can not be "proven" in that sense. Full stop. ******** Jim Van Zandt writes Zandt> Consider a fitness function f(x), where x is a binary Zandt> chromosome with 64 bits. However, suppose a malevalent colleague Zandt> modifies your source code so what your algorithm really Zandt> calculates is f(e(x)), where e(x) is a DES encryption of x. I Zandt> think you will agree that the NFL theorem applies to this case Zandt> - all optimization algorithms will have equal performance. No they won't! This illustrates a point of confusion. The algorithm which first decrypts x and then optimizes intelligently will do much better. This is precisely the point, use knowledge of f to design the best algorithm!!! Zandt> I find GAs attractive because they can exploit very general Zandt> forms of correlation. *Any* scheme that tries to "exploit very general forms of correlation" will die a gruesome death chasing its own tail just as readily as it will succeed. That is what we prove (!!). "Exploiting general forms of correlation" is NOT an unalloyed blessing. That's the whole point of the NFL results. If the correlations you are "exploiting" are not present in f then at best (!) you can hope to perform as well as a random algorithm. And since you don't know precisely what correlations are exploited by a GA you run the risk they are not present in f. If you know nothing of f the NFL therem states that since they're as likely to not be present as they are likely to be present, you're running a large risk. Zandt> I conclude that we must try to distinguish the particular Zandt> functions we optimize by choosing encodings and genetic Zandt> operators that will correlate the phenotypes and genotypes in Zandt> a way that the GA can exploit. Sure. But better yet, don't be so wed to a GA. See above. ******* Tuan.Pham@unsw.EDU.AU writes Pham> If I understand correctly, the NFL theorem says that for Pham> every function that favours some search rule, we can devise Pham> one that doesn't, simply by putting the optimum where it is Pham> least expected. Pham> Pham> For example, the follow function (f1) can be maximized Pham> by the rule "always go up the slope": Pham> Pham> f1 | Pham> | * Pham> | * * Pham> | * * Pham> | * * Pham> | * Pham> |___________________ Pham> x Pham> Pham> but the rule will fail for f2: Pham> Pham> f2 | * Pham> | * Pham> | * * Pham> | * * Pham> | * * Pham> | * Pham> |___________________ Pham> x Although this is true, our result is far stronger. Note that a hill climber will do well for both f2 and f1, even if it's not optimal for f2. We're saying even this (the hill-climber doing well) does not hold in general. There are f's for which the hill-climber will do *worse than random search*. And there are (very roughly speaking) just as many of those f's as there are f's like f1 and f2, for which the algorithm does better than random search. In fact, our result isn't even restricted to seeing how good the best-point-searched-so-far is, as is implicit in Dr. Pham's message. For *any* functional of the set of all points searched-so-far, any two algorithms are equal, on average. Pham> In practice however, functions of type f1 will be overwhelmingly Pham> more frequent than those of type f2 HOW DO YOU KNOW THIS? Part of the large reason for our emphasizing the NFL theorem (rather than our subsequent results in our paper) is that people are so prone to making these kinds of unsubstantiated statements. Pham> We have seen all kinds of clever test functions being devised Pham> to "stump" search methods - e.g. Shekel's foxholes, Griewangk, Pham> Schwefel, etc. but the truth is, real useful functions look nowhere Pham> like that! But you don't have to be "clever" - be dumb! It is hard *not* to create test functions that "stump" search methods. And again, how do you know "real useful functions look nowhere like that"? REMEMBER, EVEN IF THE FITNESS FUNCTION *IS* RANDOM, A SEARCH ALGORITHM WILL IMPROVE THE FITNESS OF THE MEMBERS OF ITS GENERATIONS THE LONGER IT SEARCHES. So the *obvious* question is whether the improved performance of our favorite algorithms is better on real-world problems than the improvement accompanying a random algorithm. The only (!) way to answer this question is with the benchmarks we provide (or similar ones). We are currently conducting empirical investigations of this issue. Pham> research should concentrate on characterising "real world" functions. We couldn't agree more. But that's only the first half of the program. The second is to derive a calculus by which "characteristics" of the function at hand are fed in and the optimal (!) search algorithm for those characteristics is then produced. Bill Macready and David Wolpert ------------------------------ From: spears@AIC.NRL.Navy.Mil (William M. Spears) Date: Tue, 7 Mar 95 13:55:04 EST Subject: NFL thoughts... Greetings all, I will not attempt to address the discussions about uniform distributions. However, I think David and Bill have raised some valid questions about how SOME (not all) research is done in the GA community. I will try to summarize some of MY interpretation below: (1) One should be careful to not simply state that an algorithm is better than another, without talking about a distribution over fitness functions. Upon reflection I can remember numerous occassions when someone has said to make such-and-such change (steady-state, uniform crossover, real-valued chromosomes, higher mutation rates, adaptive mutation rates, elitism, gray codes, etc.) to my GA, because it will work better (period). No mention is made of which functions it will work better on. Also, I've started to go back over proceedings and find that some papers have had the following format: GAs don't do too well on De Jong's test suite - make a change to the GA that improves it on that test suite - conclude one should make that change to the GA in general. In short, this is something we probably shouldn't do. What should we do? (2) If we want to make general changes to the GA (or invent new algorithms out of the blue) we need to attempt to answer the following questions (at least): a) What problems will this algorithm work well with? b) What problems will this algorithm work poorly with? c) How likely is it that we'll see these problems (i.e. do we have some a priori information?). d) Is there a way to examine a problem and detect whether a given algorithm will work well or poorly on it? Number (2) implies that we need to develop problem generators, where problems can METHODICALLY be changed (as opposed to the rather random assortment of problems we often use). For example, my work with boolean satisfiability can allow us to use the random problem generators from other communities (Steiner systems, graph problems, max-clique problems, etc...) to create problems for our GAs. Simply sticking with Random 3-SAT problems, we can easily create problems where the size of the problem and amount of epistatis is modified in a controlled fashion. I will not go into details here, because it isn't my intention to push only my particular ideas (I'll be happy to interact via email with anyone who is interested). My intention is to suggest we start working on creating meaningful problem generators, to help us understand the biases of our algorithms. Cheers, Bill ------------------------------ Genetic Algorithms Digest Thursday, March 16, 1995 Volume 9 : Issue 14 From: J.Kingdon@cs.ucl.ac.uk Date: Fri, 10 Mar 95 15:37:42 +0000 Subject: NFL (Re: v9n12) NFL etc.. I agree with most of the points made by Bill and David, however there are a couple of points - Bill Macready and David Wolpert - write: M&W> 1) First, we want to emphasize that the the NFL theorem and its M&W> immediate implications for search are only a relatively small fraction M&W> of our paper. We use it as a *starting point*, for theoretical M&W> investigations never before attempted: investigations of the intrinsic M&W> nature of the search problem. I disagree here - you just haven't spoken to the right people. For a while now we have been working on the basis that all search algorithm's make a trade between the time they take to find a solution and the number of assumptions they make about the solution they are trying to find. So for example, a random search, and an enumerative search, make no assumptions about the space they are searching and pay in long running times. Conversely, the X=1 search algorithm makes a massive assumption about not only the search space, but the fitness problem. It runs extremely fast, but the quality of its solution is usually very bad (except by coincidence). In between these two extremes is the broad space of search algorithms that most of us use (GAs, SA, EAs, Hillclimbing etc. etc.). But again all of these algorithms make assumptions, and are therefore biased when confronted with an arbitrary fitness landscape. In this context deceptive problems (problems designed to mislead a search heuristic) are algorithm dependent. To fix this point, note that for a binary 1-bit hillclimber, the modality (number of peaks) of the fitness landscape is dependent on the binary encoding of the search space (not that unimodality is a guarentee of good convergence - see Horn) In this sense search problems are more like *lock and key* problems, in which ideally you want to use a search algorithm that is making the right kind of assumptions about the landscape it is searching (i.e., the right kind of key for a given lock). Moreover, on this basis for a real world optimisation problem the best strategy (as measured by some minimax criteria - see below) is multiple search heuristics, on the chance that one pays off. Dr Pham writes: >> Pham> research should concentrate on characterising "real world" functions. M&W> We couldn't agree more. But that's only the first half of the program. The M&W> second is to derive a calculus by which "characteristics" of the function M&W> at hand are fed in and the optimal (!) search algorithm for those M&W> characteristics is then produced. This seems very optimistic, and seems to contradict your own theorem! Even if such a calculus existed, what it would imply is that on the basis of an M sample from the search space the calculus would deliver the best means (possibly algorithm) for determining the next point you should sample. This algorithm is already included in NFL. Therefore, as you point out (in your paper), we are left analysing the minimax properties of a given search algorithm, which again as you point out, is extremely open. Moreover, what results are achieved by a minimax criteria are dependent on what minimax criteria are layed down. Jason Kingdon ------------------------------