Cartesian genetic programming for diagnosis of Parkinson disease through handwriting analysis: Performance vs. interpretability issues

https://doi.org/10.1016/j.artmed.2020.101984Get rights and content

Highlights

  • A comparison among AI classification methods for the diagnosis of PD is performed.

  • Classification is performed on handwriting samples belonging to benchmark datasets.

  • Cartesian genetic programming (CGP) outperforms white box approaches in accuracy

  • CGP outperforms black box methods in interpretability by providing explicit rules.

  • CGP classification rules provide guidelines for the design of diagnostic protocols.

Abstract

In the last decades, early disease identification through non-invasive and automatic methodologies has gathered increasing interest from the scientific community. Among others, Parkinson's disease (PD) has received special attention in that it is a severe and progressive neuro-degenerative disease. As a consequence, early diagnosis would provide more effective and prompt care strategies, that cloud successfully influence patients’ life expectancy. However, the most performing systems implement the so called black-box approach, which do not provide explicit rules to reach a decision. This lack of interpretability, has hampered the acceptance of those systems by clinicians and their deployment on the field. In this context, we perform a thorough comparison of different machine learning (ML) techniques, whose classification results are characterized by different levels of interpretability. Such techniques were applied for automatically identify PD patients through the analysis of handwriting and drawing samples. Results analysis shows that white-box approaches, such as Cartesian Genetic Programming and Decision Tree, allow to reach a twofold goal: support the diagnosis of PD and obtain explicit classification models, on which only a subset of features (related to specific tasks) were identified and exploited for classification. Obtained classification models provide important insights for the design of non-invasive, inexpensive and easy to administer diagnostic protocols. Comparison of different ML approaches (in terms of both accuracy and interpretability) has been performed on the features extracted from the handwriting and drawing samples included in the publicly available PaHaW and NewHandPD datasets. The experimental findings show that the Cartesian Genetic Programming outperforms the white-box methods in accuracy and the black-box ones in interpretability.

Introduction

In recent years, artificial intelligence (AI) methodologies are receiving increasing interest from the scientific community, especially in the healthcare field. Indeed, AI systems can be able to digest and analyze huge amount of data originating from patients records and medical procedures. This, in turn, would provide more effective strategies for both the diagnosis and the treatment of the disease.

Even though many approaches for predicting patients’ health have been proposed in literature, very few of them have made their way into hospitals and clinics, because of the clinicians increasingly concerns with the issues of understandability and trust. These systems are unconvincing in the medical field since, typically, they adopt a black-box technique in providing the answer.

Indeed, the importance of interpretability (what the model makes explicit about the relationship between input and output) and transparency (how the model works), in one word explainability, of the decisions made by autonomously learning machines [1], [2] is evident when such approaches are of interest in critical real-world applications, as for example in medical diagnosis. Consequently, explainable AI (XAI) [3], [4] is gaining an increasing interest in the medical domain, since it is able to provide the rationale behind the decision-making processes taking part in the automatic diagnosis. Therefore, there is a high demand in developing alternative learning approaches providing explicit logical causal models [5], rather than approaches exhibiting levels of precision comparable or surpassing the level achieved by humans but using opaque models.

As regards neuro-degenerative diseases, the scientific community is working on different protocols for the early diagnosis of one of the most common neuro-degenerative disorders, i.e. the Parkinson's disease (PD). Four categories of biomarkers have been proposed for the diagnosis of the disease: clinical, imaging, biochemical, and genetic. Each biomarker has a different predictive value and few, if any, have been found useful for early diagnosis [6], [7]. Many of them could help with early diagnosis such as olfactory dysfunction [8] and α-synuclein biopsy [9], but they need large-scale clinical studies in order to accurately evaluate their predictive and practical usefulness [10], [11]. Moreover, for such a complex disease, some studies have suggested that combinations of different biomarkers might be more accurate than a single measure [7], [11].

However, several insights about the motor and neural processes occurring during both physiological and pathological conditions in patients affected by the Parkinson's disease (PD) has been gathered by the analysis of handwriting [12], [13], [14], [15], [16]. Basal Ganglia activity [17], which is impaired in people affected by the disease [18], influences motor tasks involving fine control of complex movements [19], [20], [21], such as handwriting production. Consequently, handwriting analysis could provide a cheap and non-invasive tool for supporting the early diagnosis of the disease [22] and a method for the evaluation of the disease progression [23].

Several AI-based approaches have been recently proposed for the PD diagnosis through handwriting analysis. Analyzed tasks include both handwriting (from letters to sentence) [24], [25] and drawings (meanders and spirals) [26], [27], [28]. Anyway, the majority of AI-based classification techniques adopted (such as Decision Trees, Random Forest classifiers, Support Vector Machines, Artificial Neural Networks) provide few or no clues about the decision criteria exploited for taking a decision, hampering their use for PD diagnosis by the physicians who are mainly concerned with understandability and trust issues.

A cartesian genetic programming (CGP) [29], [30] classification approach for the PD diagnosis has been proposed in previous works [31], [32]. The CGP approach is able to provide explicit decision criteria used for the diagnosis. We showed that such an approach provides comparable performance to other AI approaches in discriminating between handwriting production of PD patients and healthy subjects, but also intelligible classification rules and highlights on the most informative features involved in the diagnosis.

In this work we aim to investigate whether further insights could be provided by the analysis of handwritten letters and words with reference to interpretability of the underlying prediction model adopted. In particular, here we (a) show how the interpretability of the particular learning model adopted for supporting the diagnosis is related to the performance exhibited by the model itself, (b) quantitatively estimate the trade-off between interpretability and performance the physicians have to take into account in the diagnosis of PD when considering a specific tool, (c) show that an approach involving interpretable machine learning methods can be used by the physician for designing fine-tuned handwriting protocols for the diagnosis of the disease. To this end, we have performed our analysis by taking into account three machine learning techniques mainly used in literature, ranging from the most to the least explainable, i.e. Decision Tree (DT) [33], Random Forest (RF) [34] and Support Vector Machines (SVM) [35]. To enrich our analysis we applied also the Cartesian Genetic Programming (CGP) [29], as it represents a technique able to provide an explicit representation of the criteria for discriminating between PD patients and healthy subjects. The experimental testing has been carried out by using two widely used datasets for the PD diagnosis, i.e. PaHaW [24] and NewHandPD [36].

We found that CGP approach outperforms the white box methods in accuracy and the black box ones in interpretability. Consequently, we exploited the classification model obtained by the CGP for designing a handwriting protocol for the diagnosis of PD that, compared to existing diagnostic methods is non-invasive, inexpensive and quick-to-administer.

The work is organized as follows: Section 2 reports the state of the art in the field of handwriting analysis for the diagnosis of Parkinson's disease, Section 3 describes the proposed approach, the datasets used in the study and the features extracted from each dataset. Section 4 describes the parameters calibration, obtained through an exploratory tuning phase, and discusses the obtained results; eventually, Section 5 sketches the conclusions and future directions of the work.

Section snippets

State of art

The most recent and complete reviews of the state of art of machine learning applied to the diagnosis of Parkinson Disease are presented in [37], [38]. The former has the main goal of evaluating how much machine learning techniques for automatic diagnosis are effective when handling the problem of PD identification. The authors evidence that the majority of works make use of signal- or image-based data acquired by using sensors and that many approaches adopted in literature require some sort of

Classifiers

In the following, we briefly summarize the machine learning tools we have compared and the datasets used for both performance evaluation and assessment of the trade-off between performance and explainability.

Training and test set construction

Because both the datasets contain samples drawn by few subjects we adopted a 10-fold stratified cross-validation with shuffling as resampling method to achieve a more reliable evaluation of the classification performance [38], [24].

Each dataset was shuffled 10 times and at each shuffle it was divided into 10 mutually exclusive and exhaustive subsets. Each subset, one at time, was selected as test set and the union of all other subsets was considered as training set. It follows that the

Conclusions

In the context of the diagnosis of Parkinson's disease through handwriting analysis, we have addressed the problem of estimating the tradeoff between accuracy and interpretability for the most widely adopted AI-based methods proposed in the literature. For the purpose, we have selected four classifiers exhibiting different level of interpretability and compare the accuracy they achieve on the two datasets publicly available and widely adopted for performance evaluation. The hyper-parameters of

Conflict of interest

None declared.

Acknowledgment

The work reported in this paper was partially funded by the “Bando PRIN 2015 – Progetto HAND” under Grant H96J16000820001 from the Italian Ministero dell’Istruzione, dell’Università e della Ricerca.

References (59)

  • I. De Falco et al.

    A genetic programming-based regression for extrapolating a blood glucose-dynamics model from interstitial glucose measurements and their first derivatives

    Appl Soft Comput

    (2019)
  • I. De Falco et al.

    Genetic programming-based induction of a glucose-dynamics model for telemedicine

    J Netw Comput Appl

    (2018)
  • A. Borrelli et al.

    Performance of genetic programming to extract the trend in noisy data series

    Physica A: Stat Mech Appl

    (2006)
  • H. Hagras

    Toward human-understandable, explainable AI

    Computer

    (2018)
  • F. Doshi-Velez et al.

    Towards a rigorous science of interpretable machine learning

    (2017)
  • A. Adadi et al.

    Peeking inside the black-box: a survey on explainable artificial intelligence (XAI)

    IEEE Access

    (2018)
  • D. Doran et al.

    What does explainable AI really mean? A new conceptualization of perspectives

    (2017)
  • P.P. Angelov et al.

    Toward anthropomorphic machine learning

    Computer

    (2018)
  • W. Le et al.

    Can biomarkers help the early diagnosis of Parkinson’s disease?

    Neurosci Bull

    (2017)
  • T. Li et al.

    Biomarkers for parkinson’s disease: how good are they?

    Neurosci Bull

    (2020)
  • J.F. Morley et al.

    Optimizing olfactory testing for the diagnosis of Parkinson’s disease: item analysis of the university of pennsylvania smell identification test

    NPJ Parkinson’s Dis

    (2018)
  • D.M. O’Hara et al.

    Methods for detecting toxic α-synuclein species as a biomarker for parkinson's disease

    Crit. Rev. Clin. Lab. Sci.

    (2020)
  • C.-W. Chang et al.

    Plasma and serum alpha-synuclein as a biomarker of diagnosis in patients with Parkinson’s disease

    Front Neurol

    (2020)
  • A.H.V. Schapira

    Recent developments in biomarkers in Parkinson disease

    Curr Opin Neurol

    (2013)
  • M.P. Broderick et al.

    Hypometria and bradykinesia during drawing movements in individuals with Parkinson’s disease

    Exp Brain Res

    (2009)
  • A.W.A. Van Gemmert et al.

    Parkinson’s disease patients undershoot target size in handwriting and similar tasks

    J Neurol Neurosurg Psychiatry

    (2003)
  • R. Senatore et al.

    A neural scheme for procedural motor learning of handwriting

    Proceedings – international conference on frontiers in handwriting recognition

    (2012)
  • J. Jankovic

    Parkinson’s disease: clinical features and diagnosis

    (2008)
  • A. Marcelli et al.

    Some observations on handwriting from a motor learning perspective

    CEUR workshop proceedings, vol. 1022

    (2013)
  • Cited by (0)

    View full text