Enhanced decision tree induction using evolutionary techniques for Parkinson's disease classification

doi:10.1016/j.bbe.2022.07.002

Biocybernetics and Biomedical Engineering

Volume 42, Issue 3, July–September 2022, Pages 902-920

https://doi.org/10.1016/j.bbe.2022.07.002 Get rights and content

Abstract

The diagnosis of Parkinson's disease (PD) is important in neurological pathology for appropriate medical therapy. Algorithms based on decision tree induction (DTI) have been widely used for diagnosing PD through biomedical voice disorders. However, DTI for PD diagnosis is based on a greedy search algorithm which causes overfitting and inferior solutions. This paper improved the performance of DTI using evolutionary-based genetic algorithms. The goal was to combine evolutionary techniques, namely, a genetic algorithm (GA) and genetic programming (GP), with a decision tree algorithm (J48) to improve the classification performance. The developed model was applied to a real biomedical dataset for the diagnosis of PD. The results showed that the accuracy of the J48, was improved from 80.51% to 89.23% and to 90.76% using the GA and GP, respectively.

Introduction

Voice disorders might arise due to physiological diseases that are commonly observed in patients with Parkinson's disease [1], and speech clinicians are able to measure voice functions objectively with acoustic tools [2]. Parkinson’s disease (PD) is a neurodegenerative disease that is characterised by the abnormal formation of Lewy bodies in the brain [3]. PD causes the deterioration of motor [4] including dysphonia, which is the abnormal impairment of the operation of the phonatory system [5], communication problems known as dysarthria [6], [7], and affects the rate and length of utterances [8]. It is the second most critical age-related neurodegenerative disorder after Alzheimer’s disease, with a pervasiveness ranging from 41 per 100,000 population for those aged 40 years and above, and intensifying to 1,900 per 100,000 population for those over 80 years of age [9], [10].

Considering the trend of increasing life expectancy across the globe [11], it is paramount that PD be diagnosed early so that this effort can be translated into swift and timely treatment [12], [13]. For example, the acoustic analysis of vowels can be applied to extract the minimum average maximum (MAMa) tree and singular value decomposition to diagnose PD [14]. The significance of the acoustic signal increases when it is considered in its totality as a way to comprehend the uniqueness of speech changes [15]. Likewise, speech signals are an appropriate biomarker for measuring the severity of PD, whether at mild or moderate levels [16]. In another investigation, oral diadochokinetic, which quantifies acoustic changes, was used to predict the level of speech impairment [17]. A multi‑level analysis was also performed using an optimized k‑nearest neighbours model for the binary classification of PD [18]. In another experiment, a combination of linear discriminant analysis (LDA) and support vector machine (SVM) was proposed in order to decrease the dimensionality of various speech features in PD [19]. An empirical mode decomposition (EMD) has also been proposed for the extraction of vocal characteristics. These characteristics are then classified by SVM and random forest (RF), which is commonly used for binary classifications [20]. EMD method has utilized the decomposition of a non-stationary signal into a series of intrinsic mode functions and thereafter the extracted features were fed into classifiers such as SVM and RF [21]. The SVM-trained Unified PD Rating Scale Motor Examination of Speech (UPDRS-S), which is a collection of speech samples, is obtained from measurements of respiration, phonation, articulation and prosody [22]. Recently, deep neural networks (DNNs) have been used to develop the latest speaker recognition system for detecting PD at an early stage [23]. Briefly, Mel-frequency cepstral coefficients (MFCCs), which actually contain information related to articulation and phonation, have been trained by DNNs to extract embedded x-vectors. According to Jeancolas, Petrovska-Delacretaz [23], x-vectors robustly represent the characteristics of speakers to discriminate between people with early-stage PD and healthy individuals. Studies on amplitude tremor frequencies which include intensity and power indices have shown increment in sustained vowels recorded for people that were diagnosed with PD and off medication [24].

An incremental machine learning (ML) technique was proposed to overcome the deficiency of supervised methods in the prediction of UPDRS [12]. Additionally, ML was used to distinguish new biomarker of PD by quantifying the symptomatic effects on voice parameters and tracking disease severity [25]. Therefore, in neurological pathology, it is common for ML algorithms and data mining tasks to be applied to clinical databases to evaluate the early diagnosis of PD [26], predict the progression of PD [27], classify microelectrode record signals in PD patients to alleviate errors in deep brain stimulation surgery [28], and so forth.

The variety and range of techniques and algorithms that have been developed for the diagnosis of Parkinson’s disease have enhanced the prediction and classification of dysphonia features for PD. Therefore, choosing an appropriate technique poses a major challenge [29], especially for acoustic datasets. The ML models that have been applied to voice recordings for the diagnosis of PD have been outlined with a brief overview of the results by [30]. Indeed, the studies listed by [30] illustrate various works that focused on improving similar datasets for the classification of PD, including the dataset by Max Little of the University of Oxford. This dataset was retrieved from the University of California Irvine (UCI) Machine Learning Repository, which will be discussed in depth in a later section.

Dysphonia feature presentation techniques were reviewed in this paper to demonstrate how they differ from the method that will be presented in Section 3. Sharma, Sundaram [13] found that the modified grey wolf optimization (MGWO) algorithm is a suitable strategy for feature selection, obtaining an accuracy of 93.87% with RF for the classification of PD. Lahmiri, Dawson [31] evaluated the radial basis function neural network (RBFNN), SVM and several other ML methods. Their experiment on the identification of PD showed that the SVM achieved a higher performance compared to all the other methods, and the RBFNN needed a large dataset to obtain better results. Khan, Mendes [32] developed a system for PD diagnosis using cartesian genetic programming (CGP) to evolve a multi-dimensional wavelet neural network, and achieved an accuracy of 90.13%. Little, McSharry [33] proposed a new method for constructing features based on the calculation of traditional (Kay Pentax Multi-Dimensional Voice Program), non-standard (correlation dimension D2) and pitch period entropy (PPE) measures. Then, a Gaussian kernel density method was applied to select high correlation features, and the SVM was used thereafter for the classification. They found a combination of non-standard features and traditional features namely, HNR, RPDE, DFA and PPE, and achieved an accuracy of 91.4%. Sakar and Kursun [34] used maximum relevance minimum redundancy (mRMR) to assess the relevance of features for PD. This study built a classification model using SVM and obtained an accuracy of 92.75 ± 1.21% with bootstrap resampling validation.

For the effective classification of PD, Ozcift [35] proposed a linear SVM for the feature selection, and then, training of the PD dataset with an ensemble classifier comprised of rotation forest (ROF) and IBk (a variant of K-NN). Compared to other methods, the ensemble method attained the highest accuracy of 96.93%. Guo, Bhattacharya [36] experimented with GP to train features and used the expectation–maximization algorithm to classify PD by transforming to a Gaussian mixture. The GP-EM method attained an accuracy of 93.12%. An effective and efficient system of diagnosis using fuzzy K-nearest neighbours (FKNN) was proposed by Chen, Huang [37]. FKNN-based approaches, with principal component analysis (PCA) as the feature reduction technique, obtained an accuracy of 96.07 ± 0.60%, considerably outperforming SVM-based methods, which had an accuracy of 86.60 ± 1.20.

Hariharan, Polat [38] proposed an intelligent hybrid system composed of feature reduction methods (PCA, LDA) and the application of various classifiers, such as least square support vector machine (LS-SVM), probabilistic neural network (PNN), and general regression neural network (GRNN) to achieve a maximum accuracy of 100% for PD. However, sequential forward selection (SFS) and sequential backward selection (SBS) approaches were used in their study for the feature selection, and thus, they were unable to re-assess the significance of the features after they had been included or removed.

Das [39] compared neural networks (NN) with DM neural, regression and decision trees (DTs). Their experiment showed NNs had the highest accuracy of 92.9% compared to the other algorithms. Ozcift and Gulten [40] presented a correlation-based feature selection (CFS) algorithm for the feature reduction, and then, constructed an ensemble of RF classifiers comprised of 30 ML algorithms. The RF classifier ensemble produced an accuracy of 87.13% for the diagnosis of PD, which surpassed the base classifiers, which had an accuracy of 84.43%.

To improve the classification performance on a small PD dataset, Li, Liu [41] proposed a fuzzy-based non-linear transformation approach to value-added classification information. Thereafter, the transformed dataset was optimized with PCA, and then trained with SVM. The proposed method generated an accuracy of 87.67% for the diagnosis of PD, which was a better performance than either the PCA or kernel principal component analysis (KPCA).

Mandal and Sairam [42] employed a robust inference framework consisting of sparse multinomial logistic regression classifiers with Haar wavelet transformation and new ensemble methods, using the ranker search method and SVM for the feature selection. A comparison was made between existing methods and the proposed method to validate the reliability of the performance analysis. Abayomi-Alli, Damaševičius [43] applied the spline interpolation and piecewise cubic Hermite interpolating polynomial interpolation methods to overcome the small dataset size. Augmented data were fed to a bidirectional LSTM (BiLSTM) deep learning network for the classification, and the results were compared with those obtained by some traditional ML algorithms.

Al Sayaydeha and Mohammad [44] employed a hybrid mode in which a OneR attribute evaluator method was used to reduce and select the features, thereby leading to an improved classification with an enhanced fuzzy min–max (EFMM) neural network. The results suggested that the EFMM-OneR provided a better outcome. Anand, Haque [45] suggested PCA and KPCA techniques for the reduction of dimensionality. An examination of various classifiers revealed that the KNN demonstrated a higher accuracy (95.52%).

A prediction method proposed by Haq, Li [46] utilized deep neural networks based on a non-invasive prediction system. To improve the results, techniques, such as the removal of missing values, a standard scalar, and a Min-Max scalar were used for the feature selection. The result that was obtained was better than with the LR, SVM and KNN. Marar, Swain [47] examined a multi-classifier for the prediction of PD while using a kernel SVM to vectorize the features. The best result was obtained with artificial neural networks (ANN) (94.87% accuracy) when compared to other classification models. Asmae, Abdelhadi [48] utilized a similar feature selection as suggested by Little, McSharry [33], but their proposed ANN and KNN classifiers obtained accuracies of 96.7% and 79.31%, respectively. Finally, in a recent work, Mohamadzadeh, Pasban [49] applied a sparse representation algorithm for the feature reduction, and then utilised sparse code classifiers, such as the approximate message passing (AMP) algorithm, which implied efficient results. To detect pathological voice, Fang, Tsao [50] used Mel frequency cepstral coefficients (MFCCs) as a feature selection approach with the DNN method which obtained 94.26% and 90.52% accuracy in male and female subjects, respectively. Automated PD identification at an early-stage was developed by applying deep convolutional neural networks (CNN) based on discrete cosine transformation (DCT) feature selection method which obtained an accuracy of 89.75% [51]. In an experiment for voice-based PD detection, the recurrent neural network (RNN) model achieved 99.74% accuracy [52].

However, even though DT is one of the most popular and increasingly used ML algorithm, it has been unable to generate a satisfactory accuracy as precisely as other algorithms in terms of the classification percentage [53], [54], [55]. Even though the DT algorithm is robust, simple, easily understood and can interpret a complicated dataset, it still needs to be improved [43], [54], [56], [57]. Therefore, Wu and Guo [58] proposed a DT induction for the classification and prediction of PD as it is non-parametric, non-linear, and unaffected by data distribution complexity or unavailable data [59], [60]. Thus, decision tree induction (DTI) was proposed for the classification and prediction of PD in this paper based on the abovementioned characteristics.

DTI has been applied to handle imbalanced classifications, which indicate an unequal distribution of classes within a dataset. Medical practitioners are often faced with the technical challenge of a biased dataset with imbalanced classes, which will affect the performance of the classifier [61]. This can be handled by selecting an optimum model among the induced DTs [60], [62]. A mechanism called inductive inference within DT-based algorithms (e.g., J48, C4.5) involves moving from concrete cases to common models, whereby binary classes for continuous learning will be generated; however, the opportunity for enhancing the efficiency of the DT within a small tree depth is limited [63], [64]. To overcome the shortcomings of DTs, some researchers have suggested an ensemble of DT models [65], [66], [67], [68]. Inducing miscellaneous trees from a training dataset is fundamental to the creation of ensembles, such as the RF algorithm, which requires a high computer memory for the voting scheme to select the final classification [53], [68].

However, a single DT would not be analysed comprehensively by applying ensemble approaches. Additionally, it is claimed that bagging ensemble-based classification algorithms (commonly used in RF) have a low diversity due to the random selection as certain original data instances may be reused multiple times and other data instances may not be used at all. Hence, the accuracy of the prediction might be affected [69]. As a result, ensembles are not a good fit for applications that require a high level of comprehension. In this paper, a DT was evolved by using evolutionary algorithms (EAs). Unlike traditional greedy inducers for the local search, EAs generate optimal, robust model trees of global search solutions [70], [71]. Therefore, EAs provide a greater improvement to the analysis of attributes than greedy algorithms. EAs are inspired by the concept of biological evolution, whereby each individual represents a candidate solution that evolves and is evaluated until it has adapted to become the optimum solution [72]. The evolution process includes fitness, selection, crossover, mutation, and offspring functions, and these functions are reiterated until an optimum threshold is obtained [73], [74], [75].

In this paper, DT algorithms were applied for the classification of a biomedical dataset to train a model to discriminate for Parkinsonism against healthy controls. The classification was then optimized by an EA to reduce the redundant components, increase the diagnostic accuracy, and improve the training time.

Section snippets

Theoretical background

The theories underpinning the study are presented in this section. Related studies, a survey of EAs and how evolutionary methods for DTI may be used to diagnose PD within vocal biomedical information will be covered in detail.

Research methodology

Neurological pathologists are in a position to take advantage of the analysis of data science to obtain comprehensive insights into Parkinsonism-related data for further vital decision-making. ML techniques enable them to understand logical and meaningful patterns through the training data gathered from real-world instances. Supervised classification methods have been widely used across numerous reputable medical-related conferences, which commonly compare the accuracy of DTs with various ML

Experiments on DT induction by EAs

In this section, the problem will be examined in depth with a real biomedical voice dataset. The reliability and accuracy of the DT developed with EAs will be tested, and finally, the outcomes will be compared.

Discussion

DTI algorithms are well-known for their model prediction in ML applications for addressing classifications and to graphically construct DTs. However, it is a challenge to determine the optimal values for the hyper parameters of DTIs, which are primarily designed manually. In this paper, hyper-heuristic EAs were proposed for designing a DTI to improve the accuracy, training time and the overfitting problem automatically. This was achieved by firstly assessing PD with one of the popular DT

Conclusion

To increase the accuracy of PD detection through biomedical voice features, the suitability of three algorithms, namely, the J48, GA and GP, in evaluating DTs was determined. According to the experiments that were conducted, GP created the best model for the biomedical voice dataset. In this paper, an analysis of how EAs can be used to evolve DTs was achieved by examining how a DT was conducted by a greedy algorithm, such as the J48. A strategy for encoding DTs using EAs was proposed. The

CRediT authorship contribution statement

Mostafa Ghane: Conceptualization, Methodology, Investigation, Software, Data curation, Formal analysis, Writing – original draft, Writing – review & editing, Validation. Mei Choo Ang: Supervision, Methodology, Investigation, Writing – original draft, Writing – review & editing, Validation. Mehrbakhsh Nilashi: Supervision, Methodology, Investigation, Writing – original draft, Writing – review & editing, Validation. Shahryar Sorooshian: Investigation, Writing – original draft, Writing – review &

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (123)

M. Nilashi et al.
A hybrid intelligent system for the prediction of Parkinson's Disease progression using machine learning techniques
Biocybern Biomed Eng
(2018)
P. Sharma et al.
Diagnosis of Parkinson’s disease using modified grey wolf optimization
Cogn Syst Res
(2019)
T. Tuncer et al.
Automated detection of Parkinson's disease using minimum average maximum tree and singular value decomposition method with vowels
Biocybern Biomed Eng
(2020)
B. Karan et al.
Parkinson disease prediction using intrinsic mode function based features from speech signal
Biocybern Biomed Eng
(2020)
T. Zhang et al.
Parkinson disease detection using energy direction features based on EMD from voice signal
Biocybern Biomed Eng
(2021)
T. Khan et al.
Classification of speech intelligibility in Parkinson's disease
Biocybern Biomed Eng
(2014)
M. Brückl et al.
Measurement of tremor in the voices of speakers with Parkinson’s disease
Procedia Comput Sci
(2018)
M. Nilashi et al.
An analytical method for measuring the Parkinson’s disease progression: A case on a Parkinson’s telemonitoring dataset
Measurement
(2019)
M. Hosny et al.
Detection of subthalamic nucleus using novel higher-order spectra features in microelectrode recordings signals
Biocybern Biomed Eng
(2021)
M. Nilashi et al.
Remote tracking of Parkinson's disease progression using ensembles of deep belief network and self-organizing map
Expert Syst Appl
(2020)

M.M. Khan et al.

Evolving multi-dimensional wavelet neural networks for classification using Cartesian Genetic Programming

Neurocomputing

(2017)

H.-L. Chen et al.

An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach

Expert Syst Appl

(2013)

M. Hariharan et al.

A new hybrid intelligent system for accurate detection of Parkinson's disease

Comput Meth Programs Biomed

(2014)

R. Das

A comparison of multiple classification methods for diagnosis of Parkinson disease

Expert Syst Appl

(2010)

A. Ozcift et al.

Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms

Comput Meth Programs Biomed

(2011)

D.-C. Li et al.

A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets

Artif Intell Med

(2011)

I. Mandal et al.

Accurate telemonitoring of Parkinson's disease diagnosis using robust inference system

Int J Med Inform

(2013)

S.-H. Fang et al.

Detection of pathological voice using cepstrum vectors: A deep learning approach

J Voice

(2019)

O. Karaman et al.

Robust automated Parkinson disease detection based on voice signals with transfer learning

Expert Syst Appl

(2021)

M.M. Ghiasi et al.

Decision tree-based diagnosis of coronary artery disease: CART model

Comput Methods Programs Biomed

(2020)

M. Nilashi et al.

A predictive method for hepatitis disease diagnosis using ensembles of neuro-fuzzy technique

J Infect Publ Health

(2019)

Y.F. Wu et al.

Dysphonic voice pattern analysis of patients in Parkinson's disease using minimum interclass probability risk feature selection and bagging ensemble learning methods

Comput Math Method Med

(2017)

G. Solana-Lavalle et al.

Analysis of voice as an assisting tool for detection of Parkinson's disease and its subsequent clinical interpretation

Biomed Signal Process Control

(2021)

M. Little et al.

Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection

Nature Precedings

(2007)

S.H. Shahmoradian et al.

Lewy pathology in Parkinson’s disease consists of crowded organelles and lipid membranes

Nat Neurosci

(2019)

M.Y. Chan et al.

Voice therapy for Parkinson’s disease via smartphone videoconference in Malaysia: A preliminary study

J Telemed Telecare

(2021)

M. Pramanik et al.

Assessment of acoustic features and machine learning for Parkinson’s detection

J Healthc Eng

(2021)

A.M. Altaher et al.

Communication challenges for people with Parkinson disease

Top Geriatr Rehabil

(2020)

N.D. Pah et al.

Phonemes based detection of parkinson’s disease for telehealth applications

Sci Rep

(2022)

S.Y. Chu et al.

Effects of utterance rate and length on the spatiotemporal index in Parkinson’s disease

Int J Speech-lang Pathol

(2020)

R. Cacabelos

Parkinson’s disease: from pathogenesis to pharmacogenomics

Int J Mol Sci

(2017)

S.-M. Fereshtehnejad et al.

Clinical criteria for subtyping Parkinson’s disease: biomarkers and longitudinal progression

Brain

(2017)

M. Naghavi et al.

Global, regional, and national age-sex specifc mortality for 264 causes of death, 1980–2016: A systematic analysis for the Global Burden of Disease Study 2016

Lancet

(2017)

N. Miller et al.

Utility and accuracy of perceptual voice and speech distinctions in the diagnosis of Parkinson's disease, PSP and MSA-P. Neurodegener

Dis Manag

(2017)

Q.W. Oung et al.

Evaluation of short-term cepstral based features for detection of Parkinson’s disease severity levels through speech signals

IOP Conf Ser: Mater Sci Eng

(2018)

F. Karlsson et al.

Assessment of speech impairment in patients with Parkinson's disease from acoustic quantifications of oral diadochokinetic sequences

J Acoust Soc Am

(2020)

F. Amato et al.

An algorithm for Parkinson’s disease speech classification based on isolated words analysis

Health Inf Sci Syst

(2021)

A. Rahman et al.

Parkinson’s disease diagnosis in cepstral domain using MFCC and dimensionality reduction with SVM classifier

Mob Inf Sys

(2021)

L. Jeancolas et al.

X-vectors: new quantitative biomarkers for early Parkinson's disease detection from speech

Front Neuroinf

(2021)

A. Suppa et al.

Voice in Parkinson's disease: a machine learning study

Front Neurol

(2022)

W. Wang et al.

Early detection of Parkinson’s disease using deep learning and machine learning

IEEE Access

(2020)

J. Mei et al.

Machine learning for the diagnosis of parkinson's disease: A review of literature

Front Aging Neurosci

(2021)

S. Lahmiri et al.

Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures

Biomed Eng Lett

(2018)

M. Little et al.

Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease

Nat Preced

(2008)

C.O. Sakar et al.

Telediagnosis of Parkinson’s disease using measurements of dysphonia

J Med Syst

(2010)

A. Ozcift

SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease

J Med Syst

(2012)

Guo P-F, Bhattacharya P, Kharma N, editors. Advances in detecting Parkinson’s disease. International Conference on...

Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, Abayomi-Alli A, editors. BiLSTM with Data Augmentation using...

Al Sayaydeha ON, Mohammad MF, editors. Diagnosis of the Parkinson disease using enhanced fuzzy min-max neural network...

Anand A, Haque MA, Alex JSR, Venkatesan N, editors. Evaluation of Machine learning and Deep learning algorithms...

Cited by (15)

Parkinson's disease diagnosis using deep learning: A bibliometric analysis and literature review
2024, Ageing Research Reviews
Parkinson’s Disease (PD) is a progressive neurodegenerative illness triggered by decreased dopamine secretion. Deep Learning (DL) has gained substantial attention in PD diagnosis research, with an increase in the number of published papers in this discipline. PD detection using DL has presented more promising outcomes as compared with common machine learning approaches. This article aims to conduct a bibliometric analysis and a literature review focusing on the prominent developments taking place in this area. To achieve the target of the study, we retrieved and analyzed the available research papers in the Scopus database. Following that, we conducted a bibliometric analysis to inspect the structure of keywords, authors, and countries in the surveyed studies by providing visual representations of the bibliometric data using VOSviewer software. The study also provides an in-depth review of the literature focusing on different indicators of PD, deployed approaches, and performance metrics. The outcomes indicate the firm development of PD diagnosis using DL approaches over time and a large diversity of studies worldwide. Additionally, the literature review presented a research gap in DL approaches related to incremental learning, particularly in relation to big data analysis.
Semantic TRIZ feasibility in technology development, innovation, and production: A systematic review
2024, Heliyon
The study unfolds with an acknowledgment of the extensive exploration of TRIZ components, spanning a solid philosophy, quantitative and inductive methods, and practical tools, over the years. While the adoption of Semantic TRIZ (S-TRIZ) in high-tech industries for system development, innovation, and production has increased, the application of AI technologies to specific TRIZ components remains unexplored. This systematic literature review is conducted to delve into the detailed integration of AI with TRIZ, particularly S-TRIZ. The results elucidate the current state of AI applications within TRIZ, identifying focal TRIZ components and areas requiring further study. Additionally, the study highlights the trending AI technologies in this context. This exploration serves as a foundational resource for researchers, developers, and inventors, providing valuable insights into the integration of AI technologies with TRIZ concepts. The study not only paves the way for the development and automation of S-TRIZ but also outlines limitations for future research, guiding the trajectory of advancements in this interdisciplinary field.
A systematic review of the soft computing methods shaping the future of the metaverse
2024, Applied Soft Computing
The metaverse is an emerging technology with the potential to revolutionize our interactions with digital environments. Soft computing presents exciting opportunities in shaping this immersive virtual world. This paper provides a systematic review of the research on soft computing methods in the metaverse, highlighting the interdisciplinary nature of the field and the need for coordination to shape its future. The systematic literature review conducted in this article identifies the contributors and domains in soft computing, emphasizing the need for new developments and joint applications in soft computing and the metaverse. The study categorizes soft computing techniques into five classes - machine learning, fuzzy systems, evolutionary computing, probability analysis, and mixed/hybrid methods - contributing to the emerging metaverse-related research and development. We propose a decision framework for selecting the most suitable soft computing method to assist researchers and developers in methodically assessing the alternative methods. The findings provide a roadmap and opportunities for soft computing models and applications shaping the future of the metaverse. This article can serve as a useful reference for researchers, practitioners, and policymakers working in soft computing and the metaverse.
A novel method for petroleum and natural gas resource potential evaluation and prediction by support vector machines (SVM)
2023, Applied Energy
Petroleum and natural gas resources (PNGR) are some of the major forms of fossil energy that are important for the development of industry and energy security. Along with the growing demand of petroleum consumption and the requirement for enhancing drilling success rate, reducing the exploration risk and saving exploration cost, prediction method for PNGR potential with high accuracy and wide practicability is needed. However, the existing PNGR evaluation and prediction methods based on traditional statistical principles are far from meeting the requirements of the present petroleum exploration and exploitation. Therefore, this study introduces a novel method for PNGR potential prediction by applying support vector machines (SVM) in the context of the rapid development of artificial intelligence and machine learning. This novel machine learning methodology first proposed a combination of support vector classification (SVC) for hydrocarbon accumulation probability prediction and then support vector regression (SVR) for reserve abundance prediction. The combining use of classification and regression model can fully utilize the professional knowledge of petroleum geology and the powerful data processing capabilities of machine learning algorithms and hence significantly improve the performance of the method. Furthermore, the dataset is set based on petroleum geology knowledge with the feature variables of source rock, sandstone reservoirs, sealing capacity and hydrocarbon migration, whose distribution are predictable and thus ensures the predictive effect in practical petroleum exploration. The results show that the testing accuracy of the hydrocarbon accumulation probability evaluation model by SVC ranges from 80% to 100% with an average of 88.92%. The performance of the SVR model for evaluating reserve abundance also performs well with the highest correlation coefficient of 0.767. In addition, several validation ways are applied for testing the reliability and stability of the model. For a hold-out test for a new zone, the model provides precise prediction of hydrocarbon accumulation probability and reserve abundance with an accuracy of 72.5% and a correlation coefficient of 0.744. The evaluation metric of the F1-score shows an average of 0.91 for the SVC models, the 4-fold cross-validation shows an average correlation coefficient of 0.663 for SVR model, which indicates the good performance of the SVC and SVR model. To conclude, this study not only provides an intelligent ML method system for PNGR potential precisely evaluation and prediction with the combination of SVC and SVR which is firstly used by application of ML in petroleum industry field, but is also significant for the application of ML in petroleum and natural gas exploration and exploitation.
Automatic design of machine learning via evolutionary computation: A survey
2023, Applied Soft Computing
Machine learning (ML), as the most promising paradigm to discover deep knowledge from data, has been widely applied to practical applications, such as recommender systems, virtual reality, and semantic segmentation. However, building a high-quality ML system for given tasks requires expert knowledge and high computation cost. This poses a significant challenge to the further development of ML in large-scale practical applications. The automatic design of ML has become an increasingly popular research trend. At the same time, evolutionary computation (EC), as an excellent heuristic search technique, has been widely employed in ML optimization, so-called evolutionary machine learning (EML). In this paper, we offer a comprehensive review of the literature (more than 500 references) for EML methods. We first introduce the concepts related to ML and EC. After that, we propose a taxonomy criterion based on the ML and EC perspectives. The important research problems of EML, e.g., ML algorithms, solution representations, search paradigms, acceleration strategies and applications, are reviewed systematically. Lastly, we analyze EML limitations and discuss potential trends that are promising to address in the future.
Electroencephalography (EEG) eye state classification using learning vector quantization and bagged trees
2023, Heliyon
The analysis of Electroencephalography (EEG) signals has been an effective way of eye state identification. Its significance is highlighted by studies that examined the classification of eye states using machine learning techniques. In previous studies, supervised learning techniques have been widely used in EEG signals analysis for eye state classification. Their main goal has been the improvement of classification accuracy through the use of novel algorithms. The trade-off between classification accuracy and computation complexity is an important task in EEG signals analysis. In this paper, a hybrid method that can handle multivariate signals and non-linear is proposed with supervised and un-supervised learning to achieve a fast EEG eye state classification with high prediction accuracy to provide real-time decision-making applicability. We use the Learning Vector Quantization (LVQ) technique and bagged tree techniques. The method was evaluated on a real-world EEG dataset which included 14976 instances after the removal of outlier instances. Using LVQ, 8 clusters were generated from the data. The bagged tree was applied on 8 clusters and compared with other classifiers. Our experiments revealed that LVQ combined with the bagged tree provides the best results (Accuracy = 0.9431) compared with the bagged tree, CART (Classification And Regression Tree) (Accuracy = 0.8200), LDA (Linear Discriminant Analysis) (Accuracy = 0.7931), Random Trees (Accuracy = 0.8311), Naïve Bayes (Accuracy = 0.8331) and Multilayer Perceptron (Accuracy = 0.7718), which demonstrates the effectiveness of incorporating ensemble learning and clustering approaches in the analysis of EEG signals. We also provided the time complexity of the methods for prediction speed (Observation/Second). The result showed that LVQ + Bagged Tree provides the best result for prediction speed (58942 Obs/Sec) in relation to Bagged Tree (28453 Obs/Sec), CART (27784 Obs/Sec), LDA (26435 Obs/Sec), Random Trees (27921), Naïve Bayes (27217) and Multilayer Perceptron (24163).

View all citing articles on Scopus

View full text

Original Research ArticleEnhanced decision tree induction using evolutionary techniques for Parkinson's disease classification

Abstract

Introduction

Section snippets

Theoretical background

Research methodology

Experiments on DT induction by EAs

Discussion

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Biocybern Biomed Eng

Cogn Syst Res

Biocybern Biomed Eng

Biocybern Biomed Eng

Biocybern Biomed Eng

Biocybern Biomed Eng

Procedia Comput Sci

Measurement

Biocybern Biomed Eng

Expert Syst Appl

Neurocomputing

Expert Syst Appl

Comput Meth Programs Biomed

Expert Syst Appl

Comput Meth Programs Biomed

Artif Intell Med

Int J Med Inform

J Voice

Expert Syst Appl

Comput Methods Programs Biomed

J Infect Publ Health

Comput Math Method Med

Analysis of voice as an assisting tool for detection of Parkinson's disease and its subsequent clinical interpretation

Biomed Signal Process Control

Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection

Nature Precedings

Lewy pathology in Parkinson’s disease consists of crowded organelles and lipid membranes

Nat Neurosci

Voice therapy for Parkinson’s disease via smartphone videoconference in Malaysia: A preliminary study

J Telemed Telecare

Assessment of acoustic features and machine learning for Parkinson’s detection

J Healthc Eng

Communication challenges for people with Parkinson disease

Top Geriatr Rehabil

Phonemes based detection of parkinson’s disease for telehealth applications

Sci Rep

Effects of utterance rate and length on the spatiotemporal index in Parkinson’s disease

Int J Speech-lang Pathol

Parkinson’s disease: from pathogenesis to pharmacogenomics

Int J Mol Sci

Clinical criteria for subtyping Parkinson’s disease: biomarkers and longitudinal progression

Brain

Global, regional, and national age-sex specifc mortality for 264 causes of death, 1980–2016: A systematic analysis for the Global Burden of Disease Study 2016

Lancet

Utility and accuracy of perceptual voice and speech distinctions in the diagnosis of Parkinson's disease, PSP and MSA-P. Neurodegener

Dis Manag

Evaluation of short-term cepstral based features for detection of Parkinson’s disease severity levels through speech signals

IOP Conf Ser: Mater Sci Eng

Assessment of speech impairment in patients with Parkinson's disease from acoustic quantifications of oral diadochokinetic sequences

J Acoust Soc Am

An algorithm for Parkinson’s disease speech classification based on isolated words analysis

Health Inf Sci Syst

Parkinson’s disease diagnosis in cepstral domain using MFCC and dimensionality reduction with SVM classifier

Mob Inf Sys

X-vectors: new quantitative biomarkers for early Parkinson's disease detection from speech

Front Neuroinf

Voice in Parkinson's disease: a machine learning study

Front Neurol

Early detection of Parkinson’s disease using deep learning and machine learning

IEEE Access

Machine learning for the diagnosis of parkinson's disease: A review of literature

Front Aging Neurosci

Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures

Biomed Eng Lett

Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease

Nat Preced

Telediagnosis of Parkinson’s disease using measurements of dysphonia

J Med Syst

SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease

Original Research Article
Enhanced decision tree induction using evolutionary techniques for Parkinson's disease classification