A hybrid model for predicting human physical activity status from lifelogging data

https://doi.org/10.1016/j.ejor.2019.05.035Get rights and content

Highlights

  • A real-time monitoring system of individual lifelogging data using wearable sensors.

  • A novel hybrid model to predict find-grained physical activity status over time.

  • Competitive model performance with better generality and flexible non-linearity.

  • A personalised healthcare decision support tool allowing patient empowerment.

  • Contribute to the recent use of operations research, machine learning in healthcare.

Abstract

One trend in the recent healthcare transformations is people are encouraged to monitor and manage their health based on their daily diets and physical activity habits. However, much attention of the use of operational research and analytical models in healthcare has been paid to the systematic level such as country or regional policy making or organisational issues. This paper proposes a model concerned with healthcare analytics at the individual level, which can predict human physical activity status from sequential lifelogging data collected from wearable sensors. The model has a two-stage hybrid structure (in short, MOGP-HMM) – a multi-objective genetic programming (MOGP) algorithm in the first stage to reduce the dimensions of lifelogging data and a hidden Markov model (HMM) in the second stage for activity status prediction over time. It can be used as a decision support tool to provide real-time monitoring, statistical analysis and personalized advice to individuals, encouraging positive attitudes towards healthy lifestyles. We validate the model with the real data collected from a group of participants in the UK, and compare it with other popular two-stage hybrid models. Our experimental results show that the MOGP-HMM can achieve comparable performance. To the best of our knowledge, this is the very first study that uses the MOGP in the hybrid two-stage structure for individuals’ activity status prediction. It fits seamlessly with the current trend in the UK healthcare transformation of patient empowerment as well as contributing to a strategic development for more efficient and cost-effective provision of healthcare.

Introduction

With the significant development of technologies and the radical changes of socio-economic environment, the management planning and decision-making faced by businesses have become more and more complex, requiring the use of sophisticated analytical tools. Operational research techniques (e.g., optimisation, forecasting, simulation) together with other quantitative disciplines (e.g., probability theory, statistics, machine learning, data mining) are particularly useful to solve these challenges (Chen, Kim, Oztekin, Sundaramoorthi, 2018, Grünig, Kühn, 2013, Hindle, Vidgen, 2018). Therefore, even though the contributions of the above techniques and models themselves are well-documented, the term business analytics has been established over the past decade (Doumpos & Zopounidis, 2016). Business analytics, or simply analytics, uses data, information technology, statistical analysis, mathematical models, optimisation techniques and computer-based simulations to gain improved insight about business operations and make better, fact-based decisions (Evans, 2017). In other words, business analytics is a new multidisciplinary subject which combines the fields of operational research, machine learning, data mining, statistics, big data, and so on Mortenson, Doherty, & Robinson, 2015. It highlights the growing need to use of quantitative approaches for management planning and decision making in a broader context encompassing data, processes, and systems through the integration of traditional problem structuring and solving paradigms with data management and reporting tools, in a way that facilitates learning and action planning in an operational framework (Doumpos & Zopounidis, 2016).

Healthcare is one of the world’s largest industries, with many people involved either as employees in healthcare systems or as consumers of healthcare services. Four decades ago, scholars started to use operational research techniques to design healthcare systems and to improve healthcare service delivery (Fries, 1976, Krischer, 1980). The European Working Group on Operational Research Applied to Health Services (ORAHS) has been organising annual meetings since 1975. Many of the operational research studies in healthcare have been focused on the application of systematic analysis (Brailsford & Vissers, 2011) such as national or regional policy making and organisational issues. Over the years, technology has revolutionised the way we live, learn and work. It has also been one of the forces driving healthcare transformation. One trend is that people are encouraged to monitor and manage their health based on their daily eating and their physical activity habits based on people-centred healthcare and patient empowerment (World Health Organization, 2014b). For example, Rudner, McDougall, Sailam, Smith, and Sacchetti (2016) reported a case in which a doctor suggested that a patient who had a history of seizures should wear a Fitbit.2 This device is a wearable sensor that can track the patient’s pulse rate and record it through a mobile phone application. The doctor then used the lifelogging data collected from the Fitbit to successfully determine an irregular heart beat that coincided with a grand mal seizure that had occurred three hours earlier. This is a successful application of business analytics in healthcare (sometimes called healthcare analytics) at the individual level.

In this paper, we propose a new model concerned with individual healthcare analytcs. Our model can predict human physical activity status from sequential lifelogging data collected from portable devices such as mobile phones and wearable sensors. Physical activity refers to any bodily movement produced by skeletal muscles that requires energy expenditure, including activities undertaken while working, playing, travelling, carrying out household tasks and engaging in recreational pursuits (World Health Organization, 2017). According to World Health Organization (2014a), “Insufficient physical activity is one of the 10 leading risk factors for global mortality, causing some 3.2 million deaths each year. In 2010, insufficient physical activity caused 69.3 million disability-adjusted life years (DALYs) – 2.8% of the total – globally”. As regular physical activity for adults can reduce the risk of cardiovascular disease, diabetes, cancer and all-cause mortality, the World Health Organization has set a global target to reduce by 10% the prevalence of insufficient physical activity by 2025. Reaching this target requires multisectoral collaboration among government departments and organisations. On an individual level, early disease detection and timely treatment are an effective and economic approach. The use of wearable sensors such as mobile phones, smart watches and fitness trackers to recognise and monitor human activities has recently been investigated for individual health self-management, and it has become an emerging topic in healthcare analytics.

Many conventional studies employ descriptive statistics to summarise lifelogging data and to determine certain thresholds as minimum requirements in terms of daily or weekly walking steps or other metrics to estimate human physical activity status (Caspersen, Powell, Christenson, 1985, Choi, Pak, Choi, Choi, 2007, Pate, Pratt, Blair, Haskell, Macera, Bouchard, Buchner, Ettinger, Heath, King, Kriska, Leon, Marcus, Morris, Paffenbarger, Patrick, Pollock, Rippe, Sallis, Wilmore, 1995). However, there are two major limitations of those studies. First, human physical activity status in many conventional studies is usually classified into two states, active or inactive, which has limited insights and prevents broader applications. Fine-grained classification can be further investigated to measure physical activity status. The second limitation is that many conventional studies only illustrate the static characteristics of data without considering historical information. This limitation is particularly evident in the case of individual health self-management. The pattern of physical activity from one person to the next is different. Therefore, when high dimensional sequential lifelogging data is collected from wearable sensors, it is worth considering individuals’ sequential activities and the effects of previous activities on the current activity status (Gurrin, Smeaton, Doherty, 2014, Zhou, & Gurrin).

Our proposed model has a two-stage hybrid structure (in short, MOGP-HMM). It contains a multi-objective genetic programming (MOGP) algorithm in the first stage and a hidden Markov model (HMM) in the second stage. The MOGP alleviates the first limitation mentioned above. It is a multi-class classifier that transforms a high-dimensional feature space of the collected lifelogging data into a new discrete class space which represents activity observation. The HMM in the second stage addresses the second limitation. It is a chain-structured Bayesian network which can be used to exploit the sequential patterns from observations. Simply put, an individual’s physical activity status at a time is described by a latent variable. Latent variables over time are connected through a Markov process rather than being independent of each other. Since scoring systems have been widely used in assessing quality of life (QoL) such as QoL questionnaire VF-14 (Terwee, Gerding, Dekker, Prummel, & Wiersinga, 1998) and SF-12 (Gandek et al., 1998), observation and physical activity status in our study are both expressed in terms of a measurement score ranging from the inactive state to the highly active state. Given a time series of observations, the HMM can predict an individual’s activity status accordingly. We validate the model with the real lifelogging data collected from a group of participants in the UK, and conduct experiments in a supervised learning setting (Bishop, 2007) where the scores (or states) of activity status are labelled based on the UK national health guidelines (UK National Health Service, 2015). We also compare our model with another popular hybrid model SVM-HMM which combines a support vector machine (SVM) with a HMM. Our experimental results show that the MOGP-HMM can achieve comparable performance as the SVM-HMM. However, Unlike SVMs, our MOGP-HMM model is not sensitive to the choice of kernel functions and thus provides more robust and discriminative representations of sparse data.

The research of this paper is multidisciplinary, which contributes to the recent use of operational research, machine learning, data mining, big data and the Internet of things in healthcare analytics. Firstly, this is one of the few studies which discuss the implementation of operational research in healthcare at the individual level (Royston, 1998). In the meantime, lifelogging data is truly a big data problem because it is multidimensional, it contains many different features in terms of different formats, and it can be retrieved continuously from wearable sensors. We develop a two-stage model to reduce the complexity of lifelogging data and then to predict an individual’s physical activity status over time. In essence, the proposed model is a personalized data-driven model based on the state-of-the-art machine learning algorithms so it contributes to the applications of machine learning. Further, our model can be deployed on a cloud server and can be used as a decision support tool to provide real-time monitoring, statistical analysis and personalized advice to an individual through portable digital devices. Therefore, it can be a practical application of the Internet of things in healthcare. Within the field of business analytics, our proposed model contains technology, quantitative methods and decision making. As indicated by Mortenson et al. (2015), they are the key elements of business analytics. Similar to the existing studies (Dag, Oztekin, Yucel, Bulur, Megahed, 2017, Dag, Topuz, Oztekin, Bulur, Megahed, 2016, Harris, May, Vargas, 2016, Roumani, Roumani, Nwankpa, Tanniru, 2018, Topuz, Uner, Oztekin, Yildirim, 2018), our proposed model deals with predictive analytics. From a high-level perspective in healthcare, this study fits seamlessly with the current trend in the UK healthcare for patient empowerment, and contributes to a strategic development for the provision of more efficient and cost-effective healthcare.

Technology wise, using the MOGP also provides methodological contributions in the two-stage hybrid modelling for physical activity prediction. It is a non-parametric optimisation classifier, differing from many genetic algorithms and machine learning models where parameters need to be set or trained in advance. It uses Pareto dominance to optimally select GP tree models considering the trade-off between the model fitness and complexity. Therefore, the MOGP is more efficient and robust. Unlike the SVM, it is not sensitive to the choice of kernel functions and thus provides more robust and discriminative representation of sparse data. As lifelogging data is usually sparse and noisy due to the fact that each individual usually has his or her own activity pattern, the MOGP algorithm seems more suitable than the SVM in activity learning. Although GP algorithms have been used to evolve probabilistic trees that search for the optimal topology in bioinformatics (Won, Hamelryck, Prügel-Bennett, & Krogh, 2007) and stock trading (Chen, Mabu, Shimada, Hirasawa, 2009, Ghaddar, Sakr, Asiedu, 2016), to the best of our knowledge, this is the first work that a MOGP algorithm has been used as a multi-class classifier to construct a classification-HMM hybrid model for solving sequential learning problems. Our model can be of interest and easily adapted to other relevant domains in business analytics, such as consumer choice modelling (Blanchet, Gallego, Goyal, 2016, Sandıkci, Maillart, Schaefer, Alagoz, Robert, 2008) and high dimensional business data classification or dimension reduction (Debaere, Coussement, De Ruyckc, 2018, Ghaddar, Naoum-Sawaya, 2018).

The remainder of the paper is organised as follows. Section 2 reviews the related literature. Section 3 introduces our proposed hybrid model. Section 4 describes our data, presents experimental results and gives an analysis. Section 5 concludes the paper.

Section snippets

Related work

Our study touches upon several streams of literature. In the following discussion, we review the related work in both healthcare and hybrid learning machines. For the former, we first discuss the recent studies on the use of operational research in healthcare at the country and organisational levels, and then individual health monitoring, prediction and self-management using wearable sensors. For the latter, we discuss the basic concepts and settings of hybrid learning machines and compare the

The MOGP-HMM

The proposed MOGP-HMM contains two stages: (i) a MOGP algorithm in the first stage; and (ii) a first-order HMM in the second stage. Fig. 1 presents a schematic view of the MOGP-HMM. The first-order HMM is represented as a chain-structured Bayesian network where Z1,,ZN are the latent variables representing the human physical activity status over a finite time horizon t1,,tN. and O1,,ON are the observations obtained by the MOGP algorithm based on the collected lifelogging data X=[X1,,XN]

Experiments

In this section, we introduce the collected lifelogging data, describe our experimental settings, and give an analysis of the experimental results.

Conclusion

In this paper, we propose a hybrid model MOGP-HMM to predict human physical activity status from sequential lifelogging data. The MOGP algorithm transforms the collected lifelogging data into observations, which are the input of the HMM. The latter is a chain-structured Bayesian network where the latent variables represent an individual’s physical activity status over time. Given a sequence of observations, an individual’s physical activity status can be predicted. We validate the proposed

Acknowledgments

This work was conducted with the support of the EPSRC grant MyLifeHub EP/L023679/1 and European FP7 collaborative project MyHealthAvatar (GA No: 600929).

References (78)

  • B. Gandek et al.

    Cross-validation of item selection and scoring for the SF-12 Health Survey in nine countries: results from the IQOLA Project

    Journal of Clinical Epidemiology

    (1998)
  • B. Ghaddar et al.

    Spare parts stocking analysis using genetic programming

    European Journal of Operational Research

    (2016)
  • S. Harris et al.

    Predictive analytics model for healthcare planning and scheduling

    European Journal of Operational Research

    (2016)
  • T.-H. Hejazi et al.

    A reliability-based approach for performance optimization of service industries: an application to healthcare systems

    European Journal of Operational Research

    (2019)
  • G. Hindle et al.

    Developing a business analytics methodology: a case study in the foodbank sector

    European Journal of Operational Research

    (2018)
  • Y. Li et al.

    Designing utilization-based spatial healthcare accessibility decision support systems: A case of a regional health plan

    Decision Support Systems

    (2017)
  • M. Mortenson et al.

    Operational research from Taylorism to Terabytes: a research agenda for the analytics age

    European Journal of Operational Research

    (2015)
  • S. Peddabachigari et al.

    Modeling intrusion detection system using hybrid intelligent systems

    Journal of Network and Computer Applications

    (2007)
  • G. Royston

    Shifting the balance of health care into the 21st century

    European Journal of Operational Research

    (1998)
  • J. Rudner et al.

    Interrogation of patient smartphone activity tracker to assist arrhythmia management

    Annals of Emergency Medicine

    (2016)
  • A. Tako et al.

    PartiSim: a multi-methodology framework to support facilitated simulation modelling in healthcare

    European Journal of Operational Research

    (2015)
  • G. Willis et al.

    Strategic workforce planning in healthcare: a multi-methodology approach

    European Journal of Operational Research

    (2018)
  • L. Zadeh

    Fuzzy sets

    Information and Control

    (1965)
  • Auria, L., & Moro, R. A. (2008). Support vector machines as a technique for solvency analysis. Deutsches Institut für...
  • H. Banaee et al.

    Data mining for wearable sensors in health monitoring systems: a review of recent trends and challenges

    Sensors

    (2013)
  • C. Bishop

    Pattern recognition and machine learning

    (2007)
  • J. Blanchet et al.

    A Markov chain approximation to choice modeling

    Operations Research

    (2016)
  • L. Borrajo et al.

    Hybrid neural intelligent system to predict business failure in small-to-medium-size enterprises

    International Journal of Neural Systems

    (2011)
  • J. Burkardt

    The truncated normal distribution

    (2014)
  • C. Caspersen et al.

    Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research

    Public Health Reports

    (1985)
  • V. Chen et al.

    Preface: data mining and analytics

    Annals of Operations Research

    (2018)
  • B. Choi et al.

    Daily step goal of 10,000 steps: a literature review

    Clinical & Investigative Medicine

    (2007)
  • Concha, O., Yi, R., Xu, D., Moghaddam, Z., & Piccardi, M. (2011). HMM-MIO: an enhanced hidden Markov model for action...
  • N. Cristianini et al.

    An introduction to support vector machines and other Kernel-based learning methods

    (2000)
  • P. Domingos

    The master Algorithm: How the quest for the ultimate learning machine will remake our world

    (2015)
  • J. Evans

    Business analytics: Methods, models, and decisions

    (2017)
  • B. Fries

    Bibliography of operations research in health care systems

    Operations Research

    (1976)
  • B. Ghaddar et al.

    High dimensional data classification and feature selection using support vector machines

    European Journal of Operational Research

    (2018)
  • Z. Ghahramani

    An introduction to hidden Markov models and Bayesian networks

    International Journal of Pattern Recognition and Artificial Intelligence

    (2001)
  • Cited by (17)

    • Behavioral analytics for myopic agents

      2023, European Journal of Operational Research
    • Memento: a prototype search engine for LSC 2021

      2023, Multimedia Tools and Applications
    View all citing articles on Scopus
    1

    Ji Ni is a Senior Applied Scientist at the Inception Institute of Artificial Intelligence. He was a Research Fellow of Machine Learning at the University of Lincoln working on the research of this paper.

    View full text