A Comparative Study on Machine Learning Algorithms for Knowledge Discovery

This repository is the official implementation of "A Comparative Study on Machine Learning Algorithms for Knowledge Discovery."

🚀 Update: The paper has been accepted for 17th International Conference on Control, Automation, Robotics and Vision (ICARCV 2022).

Overview: The paper aims to summarize key research works in symbolic regression and perform a comparative study to understand the strengths and limitations of each method. Finally, we highlight the challenges in the current methods and future research directions in the application of machine learning in knowledge discovery.

A. Requirements

Install miniconda to manage experiments' dependencies.

To install requirements:

conda env create -f environment.yml

B. Datasets

Feynman-03: All equations with up to 3 input variables from the AI-Feynman dataset were sampled. The resulting dataset contained 52 equations.
Nguyen-12: The datasets consisted of 12 equations with a maximum of 2 input variables. It is important to highlight that few equations contain terms such as x^6 and x^5 which were included to test the methods’ ability to understand high-frequency terms.

C. Baselines

Genetic programming (GPL): A python library called gplearn was used to perform genetic programming (GP).
Deep symbolic regression (DSR): An auto-regressive approach based on reinforcement learning search to optimize the symbolic search space. The opensource implementation of (Petersen et. el.) with default parameters was used for the benchmark tasks.
AIFeynman (AIF): A heuristic based search approach that uses recurring patterns in symbolic formulas describing natural phenomena. A custom wrapper was written around the open-source python package, AIFeynman, for the evaluation.
Neural Symbolic Regression that Scales (NeSymRes): A symbolic language modelling approach that was pretrained on a large distribution of millions of equations. The pretrained model on 100 million equations was used for the benchmark.

D. Benchmark

The baseline models can be benchmarked using the following command and arguments:

Models: gpl, dsr, aif, nesymres
Datasets: feynman, nguyen

make <DATASET-NAME>-<MODEL-NAME> noise=<NOISE-LEVEL> num_points=<NUM-POINTS>

For example, to run the benchmark for the gpl model on the feynman dataset with noise=0.1 and num_points=1000, run:

make feynman-gpl noise=0.1 num_points=1000

E. Results

Figure 1: Effect of noise on accuracy in Feynman-03 dataset

Figure 2: Effect of noise on accuracy in Nguyen-12 dataset

Other results can be found in the results and discussion section of the paper.

F. Citation

If you find this work useful in your research, please consider citing:

@inproceedings{,
  title={A Comparative Study on Machine Learning Algorithms for Knowledge Discovery},
  author={Siddesh Sambasivam Suseela, Yang Feng, Kezhi Mao},
  booktitle={17th International Conference on Control, Automation, Robotics and Vision (ICARCV 2022)},
  year={2022},
  organization={Nanyang Technological University}
}

G. Contact

For any questions, please contact Siddesh Sambasivam Suseela (siddeshsambasivam.official@gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
configs		configs
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

configs

configs

data

data

notebooks

notebooks

src

src

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

environment.yml

environment.yml

Repository files navigation

A Comparative Study on Machine Learning Algorithms for Knowledge Discovery

Table of Contents

A. Requirements

B. Datasets

C. Baselines

D. Benchmark

E. Results

F. Citation

G. Contact

About

Languages

License

SiddeshSambasivam/A-Comparative-Study-on-SciML-for-KD

Folders and files

Latest commit

History

Repository files navigation

A Comparative Study on Machine Learning Algorithms for Knowledge Discovery

Table of Contents

A. Requirements

B. Datasets

C. Baselines

D. Benchmark

E. Results

F. Citation

G. Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages