Using Heterogeneous Model Ensembles to Improve the Prediction of Yeast Contamination in Peppermint

https://doi.org/10.1016/j.procs.2022.01.319Get rights and content
Under a Creative Commons license
open access

Abstract

In this paper, we present an heterogeneous ensemble modeling approach to learn predictors for yeast contamination in freshly harvested peppermint batches. Our research is based on data about numerous parameters of the harvesting process, such as planting, tillage, fertilization, harvesting, drying, as well as information about microbial contamination. We use several different machine learning methods, namely random forests, gradient boosting trees, symbolic regression by genetic programming, and support vector machines to learn models that predict contamination on the basis of available harvesting parameters. Using those models we form model ensembles in order to improve the accuracy as well as to reduce the false negative rate, i.e., to oversee as few contaminations as possible. As we summarize in this paper, ensemble modeling indeed helps to increase the prediction accuracy for our application, especially when using only the best models. The final prediction accuracy as well as other statistical indicators such as false negative rate and false positive rate depend on the choice of the discrimination threshold; in the optimal case, model ensembles are able to predict yeast contamination with 65.91% accuracy and only 19.15% of the samples are false negative, i.e., overseen contaminations.

Keywords

yeast contamination
herbs
machine learning
heterogeneous model ensembles

Cited by (0)