Skip to content

ElsevierSoftwareX/SOFTX_2020_38

Repository files navigation

Geometric Semantic Genetic Programming into GPU

This is a C/C++/CUDA implementation of a geometric semantic genetic programming algorithm.

Parameters:

Modify the parameters in the configuration.ini file accordingly, to adjust to the desired evolutionary conditions

Name Values Description
1. Number of generations 1024 Total number of iterations of the main evolutionary loop.
2. Population Size 1024 Number of individuals in the population; also specifies the number of auxiliary random trees used in GSGP mutation operation.
3. Max Individual Length 1024 Size (number of genes) of each individual in the population and the auxiliary population.
4. FunctionRatio 0.5 Probability of selecting a function, otherwise a terminal element is selected when generating an individual.
5. VariableRatio 0.5 Probability of selecting a variable, otherwise a constant terminal is selected when generating an individual.
6. Maximun Random Constant 10 Maximum value of the random constants used. Whatever the value, negative constants of the same magnitude are also generated.
7. Log Path log/ Directory where the files generated by GSGP-CUDA will be stored.

Data Description:

It is important that the problem data are not separated by ",". Please separate your data by a blank space " ".

Software code languajes, tools, and services used

C/C++/CUDA,CUBLAS

Compilation requirements, operating enviroments & dependencies

Toolkit CUDA v10.1 && v9.2, GCC v7.4.0, CUBLAS v2.0, Linux Headers, unix-like systems, Ubuntu Linux18.04

How to compile.

nvcc -std=c++11 -O0 GsgpCuda.cu -o GsgpCuda.x  -lcublas

To run gsgpCuda it is necessary to add a name for the output file generation, as shown in the example.

./GsgpCuda.x -train_data <train_file_name>.txt -test_data <test_file_name>.txt -output_model <model_name>

<train_file_name>.txt: This file must contain the training data used to compute fitness and given in a format of space-seperated values with n columns, where the first n-1 columns are the input features and the last column is the target variable.
<test_file_name>.txt:  This file must contain the test data used to evalaute the best individual at each generation. this data does not influence the evolutionary/training process, and must be given in the same format as the training data.
If no test data is available, the user can indicate the same file as the training data to assure proper excecution, this does not afffect model evolution.
<model_name>.csv: This file contains the information needed to apply the best model found by gsgpcuda on new data (make new predictions or inferences). other auxiliary files are also generated and required.

GsgpCuda generates the following output files, which are located in the log folder

    <model_name>_initialPopulation.csv: This file will store the individuals of the initial population.
    <model_name>_randomTrees.csv: This file will store the individuals of the auxiliary population.
    <model_name>_fitnessTrain.csv: This file will store the error of the best individual in each generation with training data.
    <model_name>_fitnessTest.csv: This file will store the error of the best individual in each generation with test data.
    <model_name>_processing_time.csv: This file stores the processing times in seconds of the various modules of the algorithm. 


How to make new predictions (inference) with the best model found

To make new predictions with the best model generated by gsgpcuda it is necessary to provide the name of the model by command line, the second parameter indicates the name of the data file, and the third parameter indicates the name of the file to save the output values generated by the model.

./GsgpCuda.x -model <model_name> -input_data <new_data>.txt -prediction_output <predicted_values>.csv

<model_name>: this file contains the information needed to test the model generated by gsgpcuda.
<new_data>.txt: this file must contain the new unseen data for testing the model. the file should have the same number of columns as the number of input features in the train.txt file used to train the data, without the target column.
<predicted_values>.csv: this file will store the output values generated by the model when tested on the data in newdata.txt.


How to run unit tests for the main GsgpCUDA kernels

How to compile for kernel unit tests that initialize the population.

 nvcc -std=c++11 -O0 testInitialPopulation.cu -o testInitialPopulation.x

How to run.

./testInitialPopulation.x

How to compile for kernel unit tests that calculate the semantics.

 nvcc -std=c++11 -O0 testSemantic.cu -o testSemantic.x 

How to run.

./testSemantic.x 

How to compile for kernel unit tests that executes the semantic geometric mutation operator.

 nvcc -std=c++11 -O0 gsmTest.cu -o gsmTest.x

How to run.

./gsmTest.x

Documentation:

The documentation of the library is a Doxygen documentation. The implementation has been done in order to use the library after a very quick reading of the documentation.

About

GSGP-CUDA - a CUDA framework for Geometric Semantic Genetic Programming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published