Abstract: |
An improved heuristic technique based on a method that we developed in 2003 [1]. The original method was called Formula Prediction using Genetic Algorithms (FPEG) and that method presented an algorithm by which formulas could be generated directly from datasets. The method presented enhanced the power and the flexibility of the original model resulting a better formula generation tool. The enhanced method differs from the original by the formula structure and expressive power - it has a larger alphabet and new independent internal variables that allow for better data pattern recognition and better overall performance. The method presented here uses simulated annealing to generate mathematical equations that fit a set of input data to a function. Data is represented as a set of input and output values collected from a system under consideration. For significantly large numbers of independent variables in the input set, this problem can be intractable and as such NP hard. When Advanced Formula Prediction using Simulated Annealing (AFP) was compared against FPEG using the original benchmarks - the results obtained show that the new algorithm is able to better the original algorithm's performance by 2.65 percentage points on average - or - an average reduction of the error margin by 52.10 percent (which is statistically significant). To keep the comparison val id, the same regression benchmarks were used. In addition, a technique that encodes strings that represent the candidate formulas during the search was enhanced to give it more expressive power. |