Elsevier

Pattern Recognition Letters

Volume 73, 1 April 2016, Pages 41-43
Pattern Recognition Letters

Diagnosing a disorder in a classification benchmark

https://doi.org/10.1016/j.patrec.2016.01.004Get rights and content

Highlights

  • The UCI BUPA Liver Disorders dataset is a common classification benchmark.

  • The final variable in the dataset is a train/test indicator, not a classification label.

  • We have surveyed many papers which use this dataset.

  • A large majority of surveyed papers misinterpret the final column, with meaningless results.

Abstract

A large majority of the many hundreds of papers which use the UCI BUPA Liver Disorders data set as a benchmark for classification misunderstand the data and use an unsuitable dependent variable.

Section snippets

The UCI BUPA Liver Disorders data set

The BUPA Liver Disorders data set was created by BUPA Medical Research and Development Ltd. (hereafter “BMRDL”) during the 1980s as part of a larger health-screening database [10]. At the time the second author was developing machine learning software, including what may be the first tree-structured genetic programming (GP) system [3], and collaborating with the BMRDL researchers who collected the data. He went on to use the data set as a GP benchmark [4]. In 1990 the data set was donated on

Misunderstanding

As stated, in the Liver data set, x6 is a dependent variable indicating number of drinks, while x7 is a selector, intended to split the data into train and test subsets for one particular experiment. However, many papers interpret x6 as an independent variable, and x7 as the target for classification. In some cases it is explicitly claimed that x7 represents the presence or absence of a liver disorder. This is incorrect. In fact, the information in this data set which pertains to diagnosis is

Discussion

There are several obviously desirable properties of classification algorithms: we want them to fit the training data well, to generalize well, to give interpretable models, and to run fast during training and during classification. Another less obvious but still desirable property is an ability to quantify confidence in the data fit, or in the classification of a particular point. Many researchers failed to detect that they were running classification on an artefactual variable, x7. Should the

Actions

How should researchers respond? There are four options.

  • Continue as before. As described above, many researchers have reported significant improvements over baseline performance. We can conclude that the x7 variable cannot be entirely random. Moreover, the lift attained by logistic regression and more sophisticated methods, relative to “predict majority”, is much larger for x7 than for the dichotomized (x6 > 5). Therefore, some might argue that there is more to learn from continuing to use x7.

Acknowledgments

Thanks to members of the UCD Natural Computing Research & Applications group for discussion. Thanks also to former members of BMRDL, Sharon Allaway, David Robinson and Roger Smolski, for helping to elucidate the status of the last column in the BUPA data set.

References (12)

  • J.C. Bezdek et al.

    Will the real Iris data please stand up?

    IEEE Trans. Fuzzy Syst.

    (1999)
  • C.T. Brown, H.W. Bullen, S.P. Kelly, R.K. Xiao, S.G. Satterfield, J.G. Hagedorn, Visualization and Data Mining in an 3D...
  • R. Forsyth

    BEAGLE – A Darwinian approach to pattern recognition

    Kybernetes

    (1981)
  • R. Forsyth et al.

    Machine Learning: Applications in Expert Systems and Information Retrieval

    (1986)
  • M. Lichman, UCI machine learning repository, 2013. URL: http://archive.ics.uci.edu/ml (accessed...
  • G. Maurelli et al.

    Artificial neural networks for the identification of the differences between “light” and “heavy” alcoholics, starting from five nonlinear biological variables

    Subst. Use Misuse

    (1998)
There are more references available in the full text version of this article.

Cited by (0)

This paper has been recommended for acceptance by Prof. A. Marcelli.

View full text