Cartesian genetic programming for diagnosis of Parkinson disease through handwriting analysis: Performance vs. interpretability issues
Graphical abstract
Introduction
In recent years, artificial intelligence (AI) methodologies are receiving increasing interest from the scientific community, especially in the healthcare field. Indeed, AI systems can be able to digest and analyze huge amount of data originating from patients records and medical procedures. This, in turn, would provide more effective strategies for both the diagnosis and the treatment of the disease.
Even though many approaches for predicting patients’ health have been proposed in literature, very few of them have made their way into hospitals and clinics, because of the clinicians increasingly concerns with the issues of understandability and trust. These systems are unconvincing in the medical field since, typically, they adopt a black-box technique in providing the answer.
Indeed, the importance of interpretability (what the model makes explicit about the relationship between input and output) and transparency (how the model works), in one word explainability, of the decisions made by autonomously learning machines [1], [2] is evident when such approaches are of interest in critical real-world applications, as for example in medical diagnosis. Consequently, explainable AI (XAI) [3], [4] is gaining an increasing interest in the medical domain, since it is able to provide the rationale behind the decision-making processes taking part in the automatic diagnosis. Therefore, there is a high demand in developing alternative learning approaches providing explicit logical causal models [5], rather than approaches exhibiting levels of precision comparable or surpassing the level achieved by humans but using opaque models.
As regards neuro-degenerative diseases, the scientific community is working on different protocols for the early diagnosis of one of the most common neuro-degenerative disorders, i.e. the Parkinson's disease (PD). Four categories of biomarkers have been proposed for the diagnosis of the disease: clinical, imaging, biochemical, and genetic. Each biomarker has a different predictive value and few, if any, have been found useful for early diagnosis [6], [7]. Many of them could help with early diagnosis such as olfactory dysfunction [8] and -synuclein biopsy [9], but they need large-scale clinical studies in order to accurately evaluate their predictive and practical usefulness [10], [11]. Moreover, for such a complex disease, some studies have suggested that combinations of different biomarkers might be more accurate than a single measure [7], [11].
However, several insights about the motor and neural processes occurring during both physiological and pathological conditions in patients affected by the Parkinson's disease (PD) has been gathered by the analysis of handwriting [12], [13], [14], [15], [16]. Basal Ganglia activity [17], which is impaired in people affected by the disease [18], influences motor tasks involving fine control of complex movements [19], [20], [21], such as handwriting production. Consequently, handwriting analysis could provide a cheap and non-invasive tool for supporting the early diagnosis of the disease [22] and a method for the evaluation of the disease progression [23].
Several AI-based approaches have been recently proposed for the PD diagnosis through handwriting analysis. Analyzed tasks include both handwriting (from letters to sentence) [24], [25] and drawings (meanders and spirals) [26], [27], [28]. Anyway, the majority of AI-based classification techniques adopted (such as Decision Trees, Random Forest classifiers, Support Vector Machines, Artificial Neural Networks) provide few or no clues about the decision criteria exploited for taking a decision, hampering their use for PD diagnosis by the physicians who are mainly concerned with understandability and trust issues.
A cartesian genetic programming (CGP) [29], [30] classification approach for the PD diagnosis has been proposed in previous works [31], [32]. The CGP approach is able to provide explicit decision criteria used for the diagnosis. We showed that such an approach provides comparable performance to other AI approaches in discriminating between handwriting production of PD patients and healthy subjects, but also intelligible classification rules and highlights on the most informative features involved in the diagnosis.
In this work we aim to investigate whether further insights could be provided by the analysis of handwritten letters and words with reference to interpretability of the underlying prediction model adopted. In particular, here we (a) show how the interpretability of the particular learning model adopted for supporting the diagnosis is related to the performance exhibited by the model itself, (b) quantitatively estimate the trade-off between interpretability and performance the physicians have to take into account in the diagnosis of PD when considering a specific tool, (c) show that an approach involving interpretable machine learning methods can be used by the physician for designing fine-tuned handwriting protocols for the diagnosis of the disease. To this end, we have performed our analysis by taking into account three machine learning techniques mainly used in literature, ranging from the most to the least explainable, i.e. Decision Tree (DT) [33], Random Forest (RF) [34] and Support Vector Machines (SVM) [35]. To enrich our analysis we applied also the Cartesian Genetic Programming (CGP) [29], as it represents a technique able to provide an explicit representation of the criteria for discriminating between PD patients and healthy subjects. The experimental testing has been carried out by using two widely used datasets for the PD diagnosis, i.e. PaHaW [24] and NewHandPD [36].
We found that CGP approach outperforms the white box methods in accuracy and the black box ones in interpretability. Consequently, we exploited the classification model obtained by the CGP for designing a handwriting protocol for the diagnosis of PD that, compared to existing diagnostic methods is non-invasive, inexpensive and quick-to-administer.
The work is organized as follows: Section 2 reports the state of the art in the field of handwriting analysis for the diagnosis of Parkinson's disease, Section 3 describes the proposed approach, the datasets used in the study and the features extracted from each dataset. Section 4 describes the parameters calibration, obtained through an exploratory tuning phase, and discusses the obtained results; eventually, Section 5 sketches the conclusions and future directions of the work.
Section snippets
State of art
The most recent and complete reviews of the state of art of machine learning applied to the diagnosis of Parkinson Disease are presented in [37], [38]. The former has the main goal of evaluating how much machine learning techniques for automatic diagnosis are effective when handling the problem of PD identification. The authors evidence that the majority of works make use of signal- or image-based data acquired by using sensors and that many approaches adopted in literature require some sort of
Classifiers
In the following, we briefly summarize the machine learning tools we have compared and the datasets used for both performance evaluation and assessment of the trade-off between performance and explainability.
Training and test set construction
Because both the datasets contain samples drawn by few subjects we adopted a 10-fold stratified cross-validation with shuffling as resampling method to achieve a more reliable evaluation of the classification performance [38], [24].
Each dataset was shuffled 10 times and at each shuffle it was divided into 10 mutually exclusive and exhaustive subsets. Each subset, one at time, was selected as test set and the union of all other subsets was considered as training set. It follows that the
Conclusions
In the context of the diagnosis of Parkinson's disease through handwriting analysis, we have addressed the problem of estimating the tradeoff between accuracy and interpretability for the most widely adopted AI-based methods proposed in the literature. For the purpose, we have selected four classifiers exhibiting different level of interpretability and compare the accuracy they achieve on the two datasets publicly available and widely adopted for performance evaluation. The hyper-parameters of
Conflict of interest
None declared.
Acknowledgment
The work reported in this paper was partially funded by the “Bando PRIN 2015 – Progetto HAND” under Grant H96J16000820001 from the Italian Ministero dell’Istruzione, dell’Università e della Ricerca.
References (59)
- et al.
A paradigm for emulating the early learning stage of handwriting: performance comparison between healthy controls and Parkinson’s disease patients in drawing loop shapes
Hum Mov Sci
(2019) - et al.
Parkinsonism reduces coordination of fingers, wrist, and arm in fine motor control
Exp Neurol
(1997) - et al.
Control of stroke size, peak acceleration, and stroke duration in Parkinsonian handwriting
Hum Mov Sci
(1991) - et al.
Temporal evolution in synthetic handwriting
Pattern Recogn
(2017) - et al.
Evaluation of handwriting kinematics and pressure for differential diagnosis of Parkinson’s disease
Artif Intell Med
(2016) - et al.
A new computer vision-based approach to aid the diagnosis of Parkinson’s disease
Comput Methods Programs Biomed
(2016) - et al.
A survey on computer-assisted Parkinson’s disease diagnosis
Artif Intell Med
(2019) - et al.
Handwritten dynamics assessment through convolutional neural networks: an application to Parkinson’s disease identification
Artif Intell Med
(2018) - et al.
Assessing visual attributes of handwriting for prediction of neurological disorders “a case study on parkinson’s disease
Pattern Recogn Lett
(2019) - et al.
Character preclassification based on genetic programming
Pattern Recogn Lett
(2002)