A hybrid model for predicting human physical activity status from lifelogging data
Introduction
With the significant development of technologies and the radical changes of socio-economic environment, the management planning and decision-making faced by businesses have become more and more complex, requiring the use of sophisticated analytical tools. Operational research techniques (e.g., optimisation, forecasting, simulation) together with other quantitative disciplines (e.g., probability theory, statistics, machine learning, data mining) are particularly useful to solve these challenges (Chen, Kim, Oztekin, Sundaramoorthi, 2018, Grünig, Kühn, 2013, Hindle, Vidgen, 2018). Therefore, even though the contributions of the above techniques and models themselves are well-documented, the term business analytics has been established over the past decade (Doumpos & Zopounidis, 2016). Business analytics, or simply analytics, uses data, information technology, statistical analysis, mathematical models, optimisation techniques and computer-based simulations to gain improved insight about business operations and make better, fact-based decisions (Evans, 2017). In other words, business analytics is a new multidisciplinary subject which combines the fields of operational research, machine learning, data mining, statistics, big data, and so on Mortenson, Doherty, & Robinson, 2015. It highlights the growing need to use of quantitative approaches for management planning and decision making in a broader context encompassing data, processes, and systems through the integration of traditional problem structuring and solving paradigms with data management and reporting tools, in a way that facilitates learning and action planning in an operational framework (Doumpos & Zopounidis, 2016).
Healthcare is one of the world’s largest industries, with many people involved either as employees in healthcare systems or as consumers of healthcare services. Four decades ago, scholars started to use operational research techniques to design healthcare systems and to improve healthcare service delivery (Fries, 1976, Krischer, 1980). The European Working Group on Operational Research Applied to Health Services (ORAHS) has been organising annual meetings since 1975. Many of the operational research studies in healthcare have been focused on the application of systematic analysis (Brailsford & Vissers, 2011) such as national or regional policy making and organisational issues. Over the years, technology has revolutionised the way we live, learn and work. It has also been one of the forces driving healthcare transformation. One trend is that people are encouraged to monitor and manage their health based on their daily eating and their physical activity habits based on people-centred healthcare and patient empowerment (World Health Organization, 2014b). For example, Rudner, McDougall, Sailam, Smith, and Sacchetti (2016) reported a case in which a doctor suggested that a patient who had a history of seizures should wear a Fitbit.2 This device is a wearable sensor that can track the patient’s pulse rate and record it through a mobile phone application. The doctor then used the lifelogging data collected from the Fitbit to successfully determine an irregular heart beat that coincided with a grand mal seizure that had occurred three hours earlier. This is a successful application of business analytics in healthcare (sometimes called healthcare analytics) at the individual level.
In this paper, we propose a new model concerned with individual healthcare analytcs. Our model can predict human physical activity status from sequential lifelogging data collected from portable devices such as mobile phones and wearable sensors. Physical activity refers to any bodily movement produced by skeletal muscles that requires energy expenditure, including activities undertaken while working, playing, travelling, carrying out household tasks and engaging in recreational pursuits (World Health Organization, 2017). According to World Health Organization (2014a), “Insufficient physical activity is one of the 10 leading risk factors for global mortality, causing some 3.2 million deaths each year. In 2010, insufficient physical activity caused 69.3 million disability-adjusted life years (DALYs) – 2.8% of the total – globally”. As regular physical activity for adults can reduce the risk of cardiovascular disease, diabetes, cancer and all-cause mortality, the World Health Organization has set a global target to reduce by 10% the prevalence of insufficient physical activity by 2025. Reaching this target requires multisectoral collaboration among government departments and organisations. On an individual level, early disease detection and timely treatment are an effective and economic approach. The use of wearable sensors such as mobile phones, smart watches and fitness trackers to recognise and monitor human activities has recently been investigated for individual health self-management, and it has become an emerging topic in healthcare analytics.
Many conventional studies employ descriptive statistics to summarise lifelogging data and to determine certain thresholds as minimum requirements in terms of daily or weekly walking steps or other metrics to estimate human physical activity status (Caspersen, Powell, Christenson, 1985, Choi, Pak, Choi, Choi, 2007, Pate, Pratt, Blair, Haskell, Macera, Bouchard, Buchner, Ettinger, Heath, King, Kriska, Leon, Marcus, Morris, Paffenbarger, Patrick, Pollock, Rippe, Sallis, Wilmore, 1995). However, there are two major limitations of those studies. First, human physical activity status in many conventional studies is usually classified into two states, active or inactive, which has limited insights and prevents broader applications. Fine-grained classification can be further investigated to measure physical activity status. The second limitation is that many conventional studies only illustrate the static characteristics of data without considering historical information. This limitation is particularly evident in the case of individual health self-management. The pattern of physical activity from one person to the next is different. Therefore, when high dimensional sequential lifelogging data is collected from wearable sensors, it is worth considering individuals’ sequential activities and the effects of previous activities on the current activity status (Gurrin, Smeaton, Doherty, 2014, Zhou, & Gurrin).
Our proposed model has a two-stage hybrid structure (in short, MOGP-HMM). It contains a multi-objective genetic programming (MOGP) algorithm in the first stage and a hidden Markov model (HMM) in the second stage. The MOGP alleviates the first limitation mentioned above. It is a multi-class classifier that transforms a high-dimensional feature space of the collected lifelogging data into a new discrete class space which represents activity observation. The HMM in the second stage addresses the second limitation. It is a chain-structured Bayesian network which can be used to exploit the sequential patterns from observations. Simply put, an individual’s physical activity status at a time is described by a latent variable. Latent variables over time are connected through a Markov process rather than being independent of each other. Since scoring systems have been widely used in assessing quality of life (QoL) such as QoL questionnaire VF-14 (Terwee, Gerding, Dekker, Prummel, & Wiersinga, 1998) and SF-12 (Gandek et al., 1998), observation and physical activity status in our study are both expressed in terms of a measurement score ranging from the inactive state to the highly active state. Given a time series of observations, the HMM can predict an individual’s activity status accordingly. We validate the model with the real lifelogging data collected from a group of participants in the UK, and conduct experiments in a supervised learning setting (Bishop, 2007) where the scores (or states) of activity status are labelled based on the UK national health guidelines (UK National Health Service, 2015). We also compare our model with another popular hybrid model SVM-HMM which combines a support vector machine (SVM) with a HMM. Our experimental results show that the MOGP-HMM can achieve comparable performance as the SVM-HMM. However, Unlike SVMs, our MOGP-HMM model is not sensitive to the choice of kernel functions and thus provides more robust and discriminative representations of sparse data.
The research of this paper is multidisciplinary, which contributes to the recent use of operational research, machine learning, data mining, big data and the Internet of things in healthcare analytics. Firstly, this is one of the few studies which discuss the implementation of operational research in healthcare at the individual level (Royston, 1998). In the meantime, lifelogging data is truly a big data problem because it is multidimensional, it contains many different features in terms of different formats, and it can be retrieved continuously from wearable sensors. We develop a two-stage model to reduce the complexity of lifelogging data and then to predict an individual’s physical activity status over time. In essence, the proposed model is a personalized data-driven model based on the state-of-the-art machine learning algorithms so it contributes to the applications of machine learning. Further, our model can be deployed on a cloud server and can be used as a decision support tool to provide real-time monitoring, statistical analysis and personalized advice to an individual through portable digital devices. Therefore, it can be a practical application of the Internet of things in healthcare. Within the field of business analytics, our proposed model contains technology, quantitative methods and decision making. As indicated by Mortenson et al. (2015), they are the key elements of business analytics. Similar to the existing studies (Dag, Oztekin, Yucel, Bulur, Megahed, 2017, Dag, Topuz, Oztekin, Bulur, Megahed, 2016, Harris, May, Vargas, 2016, Roumani, Roumani, Nwankpa, Tanniru, 2018, Topuz, Uner, Oztekin, Yildirim, 2018), our proposed model deals with predictive analytics. From a high-level perspective in healthcare, this study fits seamlessly with the current trend in the UK healthcare for patient empowerment, and contributes to a strategic development for the provision of more efficient and cost-effective healthcare.
Technology wise, using the MOGP also provides methodological contributions in the two-stage hybrid modelling for physical activity prediction. It is a non-parametric optimisation classifier, differing from many genetic algorithms and machine learning models where parameters need to be set or trained in advance. It uses Pareto dominance to optimally select GP tree models considering the trade-off between the model fitness and complexity. Therefore, the MOGP is more efficient and robust. Unlike the SVM, it is not sensitive to the choice of kernel functions and thus provides more robust and discriminative representation of sparse data. As lifelogging data is usually sparse and noisy due to the fact that each individual usually has his or her own activity pattern, the MOGP algorithm seems more suitable than the SVM in activity learning. Although GP algorithms have been used to evolve probabilistic trees that search for the optimal topology in bioinformatics (Won, Hamelryck, Prügel-Bennett, & Krogh, 2007) and stock trading (Chen, Mabu, Shimada, Hirasawa, 2009, Ghaddar, Sakr, Asiedu, 2016), to the best of our knowledge, this is the first work that a MOGP algorithm has been used as a multi-class classifier to construct a classification-HMM hybrid model for solving sequential learning problems. Our model can be of interest and easily adapted to other relevant domains in business analytics, such as consumer choice modelling (Blanchet, Gallego, Goyal, 2016, Sandıkci, Maillart, Schaefer, Alagoz, Robert, 2008) and high dimensional business data classification or dimension reduction (Debaere, Coussement, De Ruyckc, 2018, Ghaddar, Naoum-Sawaya, 2018).
The remainder of the paper is organised as follows. Section 2 reviews the related literature. Section 3 introduces our proposed hybrid model. Section 4 describes our data, presents experimental results and gives an analysis. Section 5 concludes the paper.
Section snippets
Related work
Our study touches upon several streams of literature. In the following discussion, we review the related work in both healthcare and hybrid learning machines. For the former, we first discuss the recent studies on the use of operational research in healthcare at the country and organisational levels, and then individual health monitoring, prediction and self-management using wearable sensors. For the latter, we discuss the basic concepts and settings of hybrid learning machines and compare the
The MOGP-HMM
The proposed MOGP-HMM contains two stages: (i) a MOGP algorithm in the first stage; and (ii) a first-order HMM in the second stage. Fig. 1 presents a schematic view of the MOGP-HMM. The first-order HMM is represented as a chain-structured Bayesian network where are the latent variables representing the human physical activity status over a finite time horizon . and are the observations obtained by the MOGP algorithm based on the collected lifelogging data
Experiments
In this section, we introduce the collected lifelogging data, describe our experimental settings, and give an analysis of the experimental results.
Conclusion
In this paper, we propose a hybrid model MOGP-HMM to predict human physical activity status from sequential lifelogging data. The MOGP algorithm transforms the collected lifelogging data into observations, which are the input of the HMM. The latter is a chain-structured Bayesian network where the latent variables represent an individual’s physical activity status over time. Given a sequence of observations, an individual’s physical activity status can be predicted. We validate the proposed
Acknowledgments
This work was conducted with the support of the EPSRC grant MyLifeHub EP/L023679/1 and European FP7 collaborative project MyHealthAvatar (GA No: 600929).
References (78)
- et al.
Hybrid learning machines
Neurocomputing
(2009) - et al.
OR in health
European Journal of Operational Research
(2008) - et al.
OR in healthcare: A European perspective
European Journal of Operational Research
(2011) - et al.
A genetic network programming with learning approach for enhanced stock trading model
Expert Systems with Applications
(2009) - et al.
Predicting heart transplantation outcomes through data analytics
Decision Support Systems
(2017) - et al.
A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival
Decision Support Systems
(2016) - et al.
A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees
European Journal of Operational Research
(2018) - et al.
Multi-label classification of member participation in online innovation communities
European Journal of Operational Research
(2018) - et al.
Optimizing healthcare network design under reference pricing and parameter uncertainty
European Journal of Operational Research
(2017) - et al.
Editorial to the special issue “business analytics”
Omega: The International Journal of Management Science
(2016)
Cross-validation of item selection and scoring for the SF-12 Health Survey in nine countries: results from the IQOLA Project
Journal of Clinical Epidemiology
Spare parts stocking analysis using genetic programming
European Journal of Operational Research
Predictive analytics model for healthcare planning and scheduling
European Journal of Operational Research
A reliability-based approach for performance optimization of service industries: an application to healthcare systems
European Journal of Operational Research
Developing a business analytics methodology: a case study in the foodbank sector
European Journal of Operational Research
Designing utilization-based spatial healthcare accessibility decision support systems: A case of a regional health plan
Decision Support Systems
Operational research from Taylorism to Terabytes: a research agenda for the analytics age
European Journal of Operational Research
Modeling intrusion detection system using hybrid intelligent systems
Journal of Network and Computer Applications
Shifting the balance of health care into the 21st century
European Journal of Operational Research
Interrogation of patient smartphone activity tracker to assist arrhythmia management
Annals of Emergency Medicine
PartiSim: a multi-methodology framework to support facilitated simulation modelling in healthcare
European Journal of Operational Research
Strategic workforce planning in healthcare: a multi-methodology approach
European Journal of Operational Research
Fuzzy sets
Information and Control
Data mining for wearable sensors in health monitoring systems: a review of recent trends and challenges
Sensors
Pattern recognition and machine learning
A Markov chain approximation to choice modeling
Operations Research
Hybrid neural intelligent system to predict business failure in small-to-medium-size enterprises
International Journal of Neural Systems
The truncated normal distribution
Physical activity, exercise, and physical fitness: definitions and distinctions for health-related research
Public Health Reports
Preface: data mining and analytics
Annals of Operations Research
Daily step goal of 10,000 steps: a literature review
Clinical & Investigative Medicine
An introduction to support vector machines and other Kernel-based learning methods
The master Algorithm: How the quest for the ultimate learning machine will remake our world
Business analytics: Methods, models, and decisions
Bibliography of operations research in health care systems
Operations Research
High dimensional data classification and feature selection using support vector machines
European Journal of Operational Research
An introduction to hidden Markov models and Bayesian networks
International Journal of Pattern Recognition and Artificial Intelligence
Cited by (17)
Capturing complexity over space and time via deep learning: An application to real-time delay prediction in railways
2023, European Journal of Operational ResearchBehavioral analytics for myopic agents
2023, European Journal of Operational ResearchMemento: a prototype search engine for LSC 2021
2023, Multimedia Tools and ApplicationsLifelog Retrieval From Daily Digital Data: Narrative Review
2022, JMIR mHealth and uHealthAnalysis and Research on the Rehabilitation Effect of Physical Exercise on College Students' Mental Depression Based on Multidimensional Data Mining
2022, Occupational Therapy International
- 1
Ji Ni is a Senior Applied Scientist at the Inception Institute of Artificial Intelligence. He was a Research Fellow of Machine Learning at the University of Lincoln working on the research of this paper.