Original Research Article
Enhanced decision tree induction using evolutionary techniques for Parkinson's disease classification

https://doi.org/10.1016/j.bbe.2022.07.002Get rights and content

Abstract

The diagnosis of Parkinson's disease (PD) is important in neurological pathology for appropriate medical therapy. Algorithms based on decision tree induction (DTI) have been widely used for diagnosing PD through biomedical voice disorders. However, DTI for PD diagnosis is based on a greedy search algorithm which causes overfitting and inferior solutions. This paper improved the performance of DTI using evolutionary-based genetic algorithms. The goal was to combine evolutionary techniques, namely, a genetic algorithm (GA) and genetic programming (GP), with a decision tree algorithm (J48) to improve the classification performance. The developed model was applied to a real biomedical dataset for the diagnosis of PD. The results showed that the accuracy of the J48, was improved from 80.51% to 89.23% and to 90.76% using the GA and GP, respectively.

Introduction

Voice disorders might arise due to physiological diseases that are commonly observed in patients with Parkinson's disease [1], and speech clinicians are able to measure voice functions objectively with acoustic tools [2]. Parkinson’s disease (PD) is a neurodegenerative disease that is characterised by the abnormal formation of Lewy bodies in the brain [3]. PD causes the deterioration of motor [4] including dysphonia, which is the abnormal impairment of the operation of the phonatory system [5], communication problems known as dysarthria [6], [7], and affects the rate and length of utterances [8]. It is the second most critical age-related neurodegenerative disorder after Alzheimer’s disease, with a pervasiveness ranging from 41 per 100,000 population for those aged 40 years and above, and intensifying to 1,900 per 100,000 population for those over 80 years of age [9], [10].

Considering the trend of increasing life expectancy across the globe [11], it is paramount that PD be diagnosed early so that this effort can be translated into swift and timely treatment [12], [13]. For example, the acoustic analysis of vowels can be applied to extract the minimum average maximum (MAMa) tree and singular value decomposition to diagnose PD [14]. The significance of the acoustic signal increases when it is considered in its totality as a way to comprehend the uniqueness of speech changes [15]. Likewise, speech signals are an appropriate biomarker for measuring the severity of PD, whether at mild or moderate levels [16]. In another investigation, oral diadochokinetic, which quantifies acoustic changes, was used to predict the level of speech impairment [17]. A multi‑level analysis was also performed using an optimized k‑nearest neighbours model for the binary classification of PD [18]. In another experiment, a combination of linear discriminant analysis (LDA) and support vector machine (SVM) was proposed in order to decrease the dimensionality of various speech features in PD [19]. An empirical mode decomposition (EMD) has also been proposed for the extraction of vocal characteristics. These characteristics are then classified by SVM and random forest (RF), which is commonly used for binary classifications [20]. EMD method has utilized the decomposition of a non-stationary signal into a series of intrinsic mode functions and thereafter the extracted features were fed into classifiers such as SVM and RF [21]. The SVM-trained Unified PD Rating Scale Motor Examination of Speech (UPDRS-S), which is a collection of speech samples, is obtained from measurements of respiration, phonation, articulation and prosody [22]. Recently, deep neural networks (DNNs) have been used to develop the latest speaker recognition system for detecting PD at an early stage [23]. Briefly, Mel-frequency cepstral coefficients (MFCCs), which actually contain information related to articulation and phonation, have been trained by DNNs to extract embedded x-vectors. According to Jeancolas, Petrovska-Delacretaz [23], x-vectors robustly represent the characteristics of speakers to discriminate between people with early-stage PD and healthy individuals. Studies on amplitude tremor frequencies which include intensity and power indices have shown increment in sustained vowels recorded for people that were diagnosed with PD and off medication [24].

An incremental machine learning (ML) technique was proposed to overcome the deficiency of supervised methods in the prediction of UPDRS [12]. Additionally, ML was used to distinguish new biomarker of PD by quantifying the symptomatic effects on voice parameters and tracking disease severity [25]. Therefore, in neurological pathology, it is common for ML algorithms and data mining tasks to be applied to clinical databases to evaluate the early diagnosis of PD [26], predict the progression of PD [27], classify microelectrode record signals in PD patients to alleviate errors in deep brain stimulation surgery [28], and so forth.

The variety and range of techniques and algorithms that have been developed for the diagnosis of Parkinson’s disease have enhanced the prediction and classification of dysphonia features for PD. Therefore, choosing an appropriate technique poses a major challenge [29], especially for acoustic datasets. The ML models that have been applied to voice recordings for the diagnosis of PD have been outlined with a brief overview of the results by [30]. Indeed, the studies listed by [30] illustrate various works that focused on improving similar datasets for the classification of PD, including the dataset by Max Little of the University of Oxford. This dataset was retrieved from the University of California Irvine (UCI) Machine Learning Repository, which will be discussed in depth in a later section.

Dysphonia feature presentation techniques were reviewed in this paper to demonstrate how they differ from the method that will be presented in Section 3. Sharma, Sundaram [13] found that the modified grey wolf optimization (MGWO) algorithm is a suitable strategy for feature selection, obtaining an accuracy of 93.87% with RF for the classification of PD. Lahmiri, Dawson [31] evaluated the radial basis function neural network (RBFNN), SVM and several other ML methods. Their experiment on the identification of PD showed that the SVM achieved a higher performance compared to all the other methods, and the RBFNN needed a large dataset to obtain better results. Khan, Mendes [32] developed a system for PD diagnosis using cartesian genetic programming (CGP) to evolve a multi-dimensional wavelet neural network, and achieved an accuracy of 90.13%. Little, McSharry [33] proposed a new method for constructing features based on the calculation of traditional (Kay Pentax Multi-Dimensional Voice Program), non-standard (correlation dimension D2) and pitch period entropy (PPE) measures. Then, a Gaussian kernel density method was applied to select high correlation features, and the SVM was used thereafter for the classification. They found a combination of non-standard features and traditional features namely, HNR, RPDE, DFA and PPE, and achieved an accuracy of 91.4%. Sakar and Kursun [34] used maximum relevance minimum redundancy (mRMR) to assess the relevance of features for PD. This study built a classification model using SVM and obtained an accuracy of 92.75 ± 1.21% with bootstrap resampling validation.

For the effective classification of PD, Ozcift [35] proposed a linear SVM for the feature selection, and then, training of the PD dataset with an ensemble classifier comprised of rotation forest (ROF) and IBk (a variant of K-NN). Compared to other methods, the ensemble method attained the highest accuracy of 96.93%. Guo, Bhattacharya [36] experimented with GP to train features and used the expectation–maximization algorithm to classify PD by transforming to a Gaussian mixture. The GP-EM method attained an accuracy of 93.12%. An effective and efficient system of diagnosis using fuzzy K-nearest neighbours (FKNN) was proposed by Chen, Huang [37]. FKNN-based approaches, with principal component analysis (PCA) as the feature reduction technique, obtained an accuracy of 96.07 ± 0.60%, considerably outperforming SVM-based methods, which had an accuracy of 86.60 ± 1.20.

Hariharan, Polat [38] proposed an intelligent hybrid system composed of feature reduction methods (PCA, LDA) and the application of various classifiers, such as least square support vector machine (LS-SVM), probabilistic neural network (PNN), and general regression neural network (GRNN) to achieve a maximum accuracy of 100% for PD. However, sequential forward selection (SFS) and sequential backward selection (SBS) approaches were used in their study for the feature selection, and thus, they were unable to re-assess the significance of the features after they had been included or removed.

Das [39] compared neural networks (NN) with DM neural, regression and decision trees (DTs). Their experiment showed NNs had the highest accuracy of 92.9% compared to the other algorithms. Ozcift and Gulten [40] presented a correlation-based feature selection (CFS) algorithm for the feature reduction, and then, constructed an ensemble of RF classifiers comprised of 30 ML algorithms. The RF classifier ensemble produced an accuracy of 87.13% for the diagnosis of PD, which surpassed the base classifiers, which had an accuracy of 84.43%.

To improve the classification performance on a small PD dataset, Li, Liu [41] proposed a fuzzy-based non-linear transformation approach to value-added classification information. Thereafter, the transformed dataset was optimized with PCA, and then trained with SVM. The proposed method generated an accuracy of 87.67% for the diagnosis of PD, which was a better performance than either the PCA or kernel principal component analysis (KPCA).

Mandal and Sairam [42] employed a robust inference framework consisting of sparse multinomial logistic regression classifiers with Haar wavelet transformation and new ensemble methods, using the ranker search method and SVM for the feature selection. A comparison was made between existing methods and the proposed method to validate the reliability of the performance analysis. Abayomi-Alli, Damaševičius [43] applied the spline interpolation and piecewise cubic Hermite interpolating polynomial interpolation methods to overcome the small dataset size. Augmented data were fed to a bidirectional LSTM (BiLSTM) deep learning network for the classification, and the results were compared with those obtained by some traditional ML algorithms.

Al Sayaydeha and Mohammad [44] employed a hybrid mode in which a OneR attribute evaluator method was used to reduce and select the features, thereby leading to an improved classification with an enhanced fuzzy min–max (EFMM) neural network. The results suggested that the EFMM-OneR provided a better outcome. Anand, Haque [45] suggested PCA and KPCA techniques for the reduction of dimensionality. An examination of various classifiers revealed that the KNN demonstrated a higher accuracy (95.52%).

A prediction method proposed by Haq, Li [46] utilized deep neural networks based on a non-invasive prediction system. To improve the results, techniques, such as the removal of missing values, a standard scalar, and a Min-Max scalar were used for the feature selection. The result that was obtained was better than with the LR, SVM and KNN. Marar, Swain [47] examined a multi-classifier for the prediction of PD while using a kernel SVM to vectorize the features. The best result was obtained with artificial neural networks (ANN) (94.87% accuracy) when compared to other classification models. Asmae, Abdelhadi [48] utilized a similar feature selection as suggested by Little, McSharry [33], but their proposed ANN and KNN classifiers obtained accuracies of 96.7% and 79.31%, respectively. Finally, in a recent work, Mohamadzadeh, Pasban [49] applied a sparse representation algorithm for the feature reduction, and then utilised sparse code classifiers, such as the approximate message passing (AMP) algorithm, which implied efficient results. To detect pathological voice, Fang, Tsao [50] used Mel frequency cepstral coefficients (MFCCs) as a feature selection approach with the DNN method which obtained 94.26% and 90.52% accuracy in male and female subjects, respectively. Automated PD identification at an early-stage was developed by applying deep convolutional neural networks (CNN) based on discrete cosine transformation (DCT) feature selection method which obtained an accuracy of 89.75% [51]. In an experiment for voice-based PD detection, the recurrent neural network (RNN) model achieved 99.74% accuracy [52].

However, even though DT is one of the most popular and increasingly used ML algorithm, it has been unable to generate a satisfactory accuracy as precisely as other algorithms in terms of the classification percentage [53], [54], [55]. Even though the DT algorithm is robust, simple, easily understood and can interpret a complicated dataset, it still needs to be improved [43], [54], [56], [57]. Therefore, Wu and Guo [58] proposed a DT induction for the classification and prediction of PD as it is non-parametric, non-linear, and unaffected by data distribution complexity or unavailable data [59], [60]. Thus, decision tree induction (DTI) was proposed for the classification and prediction of PD in this paper based on the abovementioned characteristics.

DTI has been applied to handle imbalanced classifications, which indicate an unequal distribution of classes within a dataset. Medical practitioners are often faced with the technical challenge of a biased dataset with imbalanced classes, which will affect the performance of the classifier [61]. This can be handled by selecting an optimum model among the induced DTs [60], [62]. A mechanism called inductive inference within DT-based algorithms (e.g., J48, C4.5) involves moving from concrete cases to common models, whereby binary classes for continuous learning will be generated; however, the opportunity for enhancing the efficiency of the DT within a small tree depth is limited [63], [64]. To overcome the shortcomings of DTs, some researchers have suggested an ensemble of DT models [65], [66], [67], [68]. Inducing miscellaneous trees from a training dataset is fundamental to the creation of ensembles, such as the RF algorithm, which requires a high computer memory for the voting scheme to select the final classification [53], [68].

However, a single DT would not be analysed comprehensively by applying ensemble approaches. Additionally, it is claimed that bagging ensemble-based classification algorithms (commonly used in RF) have a low diversity due to the random selection as certain original data instances may be reused multiple times and other data instances may not be used at all. Hence, the accuracy of the prediction might be affected [69]. As a result, ensembles are not a good fit for applications that require a high level of comprehension. In this paper, a DT was evolved by using evolutionary algorithms (EAs). Unlike traditional greedy inducers for the local search, EAs generate optimal, robust model trees of global search solutions [70], [71]. Therefore, EAs provide a greater improvement to the analysis of attributes than greedy algorithms. EAs are inspired by the concept of biological evolution, whereby each individual represents a candidate solution that evolves and is evaluated until it has adapted to become the optimum solution [72]. The evolution process includes fitness, selection, crossover, mutation, and offspring functions, and these functions are reiterated until an optimum threshold is obtained [73], [74], [75].

In this paper, DT algorithms were applied for the classification of a biomedical dataset to train a model to discriminate for Parkinsonism against healthy controls. The classification was then optimized by an EA to reduce the redundant components, increase the diagnostic accuracy, and improve the training time.

Section snippets

Theoretical background

The theories underpinning the study are presented in this section. Related studies, a survey of EAs and how evolutionary methods for DTI may be used to diagnose PD within vocal biomedical information will be covered in detail.

Research methodology

Neurological pathologists are in a position to take advantage of the analysis of data science to obtain comprehensive insights into Parkinsonism-related data for further vital decision-making. ML techniques enable them to understand logical and meaningful patterns through the training data gathered from real-world instances. Supervised classification methods have been widely used across numerous reputable medical-related conferences, which commonly compare the accuracy of DTs with various ML

Experiments on DT induction by EAs

In this section, the problem will be examined in depth with a real biomedical voice dataset. The reliability and accuracy of the DT developed with EAs will be tested, and finally, the outcomes will be compared.

Discussion

DTI algorithms are well-known for their model prediction in ML applications for addressing classifications and to graphically construct DTs. However, it is a challenge to determine the optimal values for the hyper parameters of DTIs, which are primarily designed manually. In this paper, hyper-heuristic EAs were proposed for designing a DTI to improve the accuracy, training time and the overfitting problem automatically. This was achieved by firstly assessing PD with one of the popular DT

Conclusion

To increase the accuracy of PD detection through biomedical voice features, the suitability of three algorithms, namely, the J48, GA and GP, in evaluating DTs was determined. According to the experiments that were conducted, GP created the best model for the biomedical voice dataset. In this paper, an analysis of how EAs can be used to evolve DTs was achieved by examining how a DT was conducted by a greedy algorithm, such as the J48. A strategy for encoding DTs using EAs was proposed. The

CRediT authorship contribution statement

Mostafa Ghane: Conceptualization, Methodology, Investigation, Software, Data curation, Formal analysis, Writing – original draft, Writing – review & editing, Validation. Mei Choo Ang: Supervision, Methodology, Investigation, Writing – original draft, Writing – review & editing, Validation. Mehrbakhsh Nilashi: Supervision, Methodology, Investigation, Writing – original draft, Writing – review & editing, Validation. Shahryar Sorooshian: Investigation, Writing – original draft, Writing – review &

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (123)

  • M.M. Khan et al.

    Evolving multi-dimensional wavelet neural networks for classification using Cartesian Genetic Programming

    Neurocomputing

    (2017)
  • H.-L. Chen et al.

    An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach

    Expert Syst Appl

    (2013)
  • M. Hariharan et al.

    A new hybrid intelligent system for accurate detection of Parkinson's disease

    Comput Meth Programs Biomed

    (2014)
  • R. Das

    A comparison of multiple classification methods for diagnosis of Parkinson disease

    Expert Syst Appl

    (2010)
  • A. Ozcift et al.

    Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms

    Comput Meth Programs Biomed

    (2011)
  • D.-C. Li et al.

    A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets

    Artif Intell Med

    (2011)
  • I. Mandal et al.

    Accurate telemonitoring of Parkinson's disease diagnosis using robust inference system

    Int J Med Inform

    (2013)
  • S.-H. Fang et al.

    Detection of pathological voice using cepstrum vectors: A deep learning approach

    J Voice

    (2019)
  • O. Karaman et al.

    Robust automated Parkinson disease detection based on voice signals with transfer learning

    Expert Syst Appl

    (2021)
  • M.M. Ghiasi et al.

    Decision tree-based diagnosis of coronary artery disease: CART model

    Comput Methods Programs Biomed

    (2020)
  • M. Nilashi et al.

    A predictive method for hepatitis disease diagnosis using ensembles of neuro-fuzzy technique

    J Infect Publ Health

    (2019)
  • Y.F. Wu et al.

    Dysphonic voice pattern analysis of patients in Parkinson's disease using minimum interclass probability risk feature selection and bagging ensemble learning methods

    Comput Math Method Med

    (2017)
  • G. Solana-Lavalle et al.

    Analysis of voice as an assisting tool for detection of Parkinson's disease and its subsequent clinical interpretation

    Biomed Signal Process Control

    (2021)
  • M. Little et al.

    Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection

    Nature Precedings

    (2007)
  • S.H. Shahmoradian et al.

    Lewy pathology in Parkinson’s disease consists of crowded organelles and lipid membranes

    Nat Neurosci

    (2019)
  • M.Y. Chan et al.

    Voice therapy for Parkinson’s disease via smartphone videoconference in Malaysia: A preliminary study

    J Telemed Telecare

    (2021)
  • M. Pramanik et al.

    Assessment of acoustic features and machine learning for Parkinson’s detection

    J Healthc Eng

    (2021)
  • A.M. Altaher et al.

    Communication challenges for people with Parkinson disease

    Top Geriatr Rehabil

    (2020)
  • N.D. Pah et al.

    Phonemes based detection of parkinson’s disease for telehealth applications

    Sci Rep

    (2022)
  • S.Y. Chu et al.

    Effects of utterance rate and length on the spatiotemporal index in Parkinson’s disease

    Int J Speech-lang Pathol

    (2020)
  • R. Cacabelos

    Parkinson’s disease: from pathogenesis to pharmacogenomics

    Int J Mol Sci

    (2017)
  • S.-M. Fereshtehnejad et al.

    Clinical criteria for subtyping Parkinson’s disease: biomarkers and longitudinal progression

    Brain

    (2017)
  • M. Naghavi et al.

    Global, regional, and national age-sex specifc mortality for 264 causes of death, 1980–2016: A systematic analysis for the Global Burden of Disease Study 2016

    Lancet

    (2017)
  • N. Miller et al.

    Utility and accuracy of perceptual voice and speech distinctions in the diagnosis of Parkinson's disease, PSP and MSA-P. Neurodegener

    Dis Manag

    (2017)
  • Q.W. Oung et al.

    Evaluation of short-term cepstral based features for detection of Parkinson’s disease severity levels through speech signals

    IOP Conf Ser: Mater Sci Eng

    (2018)
  • F. Karlsson et al.

    Assessment of speech impairment in patients with Parkinson's disease from acoustic quantifications of oral diadochokinetic sequences

    J Acoust Soc Am

    (2020)
  • F. Amato et al.

    An algorithm for Parkinson’s disease speech classification based on isolated words analysis

    Health Inf Sci Syst

    (2021)
  • A. Rahman et al.

    Parkinson’s disease diagnosis in cepstral domain using MFCC and dimensionality reduction with SVM classifier

    Mob Inf Sys

    (2021)
  • L. Jeancolas et al.

    X-vectors: new quantitative biomarkers for early Parkinson's disease detection from speech

    Front Neuroinf

    (2021)
  • A. Suppa et al.

    Voice in Parkinson's disease: a machine learning study

    Front Neurol

    (2022)
  • W. Wang et al.

    Early detection of Parkinson’s disease using deep learning and machine learning

    IEEE Access

    (2020)
  • J. Mei et al.

    Machine learning for the diagnosis of parkinson's disease: A review of literature

    Front Aging Neurosci

    (2021)
  • S. Lahmiri et al.

    Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures

    Biomed Eng Lett

    (2018)
  • M. Little et al.

    Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease

    Nat Preced

    (2008)
  • C.O. Sakar et al.

    Telediagnosis of Parkinson’s disease using measurements of dysphonia

    J Med Syst

    (2010)
  • A. Ozcift

    SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease

    J Med Syst

    (2012)
  • Guo P-F, Bhattacharya P, Kharma N, editors. Advances in detecting Parkinson’s disease. International Conference on...
  • Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, Abayomi-Alli A, editors. BiLSTM with Data Augmentation using...
  • Al Sayaydeha ON, Mohammad MF, editors. Diagnosis of the Parkinson disease using enhanced fuzzy min-max neural network...
  • Anand A, Haque MA, Alex JSR, Venkatesan N, editors. Evaluation of Machine learning and Deep learning algorithms...
  • Cited by (15)

    View all citing articles on Scopus
    View full text