A symbolic genetic programming approach for identifying models of learning-by-doing

https://doi.org/10.1016/j.cie.2018.08.020Get rights and content

Highlights

  • Symbolic Regression proposed to aid in Identifying Useful Learning Curve Models.

  • Existing models are independently identified, supporting prior theory.

  • Empirical evidence presented in support of 2 & 3 parameter models being effective.

Abstract

In this study, we apply a symbolic regression approach to generate and investigate new potential univariate learning curve functional forms to forecast human learning responses efficiently and stably. Past studies have compared learning models in the literature to one another. Yet, continued interest in model development and comparison suggests that the question remains open as to whether there are other useful and yet-undiscovered models. We address the question of whether the existing literature contains the best model choices, or if additional forms have merit. We employ a multigenic genetic programming algorithm to secondary field data from a range of manual sewing tasks. We identified an array of potentially useful empirical forms and examined whether these forms match or improve upon extant forms. Among two-parameter functional forms, the log-linear form performed well in efficiency and stability for both models of cumulative experience, and cumulative working time. A three-parameter hyperbolic model was found and top-ranked as a model of cumulative work and a model of cumulative time in the three-parameter learning curve functional forms. We also found that 4-parameter models show characteristics of over-fitting and have small marginal differences in efficiency and stability for models of cumulative working time, which suggests that a three-parameter model may be a good choice, in general.

Introduction

With the ongoing developments in industrial technologies, organizations are putting greater effort into enhancing manufacturing workers’ productivity. In order to achieve such improvements, measuring and utilizing workers’ learning and forgetting behavior has become an important area of study. Correspondingly, learning and forgetting models have been investigated for some time. The learning curve has been used as a management tool for much of the twentieth century, and is seen as an effective method for estimating unit cost and labor trends. Restle and Greeno (1970) suggested that learning is a replacement process in which incorrect responses tend to be replaced by correct ones. A similar theory was also described by Vigil and Sarper (1994), who presented the view that the learning curve is a model of the continual reduction in unit cost or labor that occurs with increasing cumulative production. As a measurement and mathematical description of workers’ performance in repetitive tasks, there are multiple dependent variables in Learning Curve (LC) models, which include: time to produce a single unit, number of units produced per time interval, costs to produce a single unit, and percentage of non-conforming units, as described by Jaber (2016).

Several studies have compared learning models in the literature to one another. Yet, continued interest in developing and comparing models suggests that the question remains unsettled regarding whether there are as yet-undiscovered models that may be useful. Research on the nature of the structure and mathematical form of the human learning curve is broad and includes work from the areas of psychology, mathematics, and many fields of engineering. We employ a novel symbolic regression approach using a multi-genic Genetic Programming (GP) algorithm to secondary field data from a range of manual tasks. A multi-genic approach closely follows the evolutionary inspiration, whereby genetic crossovers provide a useful mechanism to break-away from local solutions, and explore the solution space more broadly, and yet be informed by the fitness of the individual genes. This study contributes to learning curve theory by finding independently discovered models previously suggested from theoretical bases. We also contribute to practice, by presenting an effective and efficient process by which future researchers and practitioners can obtain best-fit models for other learning scenarios.

Section snippets

Review of prior univariate LC models

There have been several surveys, reviews and aggregations of learning curves in the literature. For example, Yelle (1979) surveyed many univariate models of experiential learning and provided a comprehensive review of the extant models. Since then, others including Badiru (1992), Nembhard and Uzumeri, 2000a, Anzanello and Fogliatto, 2011, Srour et al., 2015, Jaber, 2016 have broadened and further examined the set of models available for researchers and practitioners. Thus, in the current study

Methodology

We employ a GP algorithm based symbolic-regression approach to generate potentially useful models and to evaluate the fitness of these models using a dataset from the literature. We consider two independent cohorts in our data, one with learning episodes associated with novice learners (training data), and the other with learning episodes associated with learners having some prior experience (evaluation data). We remark that since we are investigating models of learning behavior, the novices

Results and discussion

The highest ranked model forms for fitting cumulative work, x, to production rate, y and cumulative working time, t to y are summarized in Tables 3 and 4, respectively, along with a summary of the efficiency and stability for each of the high scoring models generated from MSR. The parametric results for each of the models presented below form multivariate normal distributions, and verify that the single cohort population forms a single cluster and distribution.

The models in Table 3, Table 4 are

Conclusions

In this paper, we address the question of what univariate learning curve functional form or forms are most efficient and stable for measuring human learning responses. We employ a symbolic regression approach using GP to generate and investigate new potential forms developed empirically and naturally from field data. We remark that over the past century, numerous learning curve forms were suggested and added to the literature. Several studies have further compared these models to one another.

Acknowledgements

The authors would like to thank The Leonhard Center at Penn State University for partial funding of this research. We also thank Haochen Xie and Chenmu Wang for their input in the early stages of this project.

References (49)

  • F.N. Andrianasoloa et al.

    Prediction of sunflower grain oil concentration as a function of variety, crop management and environment using statistical models

    European Journal of Agronomy

    (2014)
  • Asher, H. (1956). Cost-quantity relationships in the airframe industry. Rep. No. R-291. Santa Monica, CA: The Rand...
  • A.B. Badiru

    Computational survey of univariate and multivariate learning curve models

    IEEE Trans. Eng. Manage.

    (1992)
  • Y. Barak et al.

    Mathematical analysis of specific anatomic foveal configurations predisposing to the formation of macular holes

    Investigative Ophthalmology & Visual Science

    (2011)
  • G.W. Carr

    Peacetime cost estimating requires new learning curves

    Aviation

    (1946)
  • R. Castellano et al.

    Assessing the gender gap in labour market index: Volatility of results and reliability

    International Journal of Social Economics

    (2015)
  • E.R.F.W. Crossman

    A theory of the acquisition of speed-skill

    Ergonomics

    (1959)
  • E.M. Dar-El et al.

    A dual-phase model for the individual learning process in industrial tasks

    IIE Transactions

    (1995)
  • J.R. DeJong

    The effects of increasing skill on cycle time and its consequences for time standards

    Ergonomics

    (1957)
  • P.F. Delaney et al.

    The strategy specific nature of improvement: The power law applies by strategy within task

    Psychological Science

    (1998)
  • S.E. Derenzo et al.

    Fundamental limits of scintillation detector timing precision

    Physics in Medicine and Biology

    (2014)
  • J.G. Everett et al.

    Learning curve predictors for construction field operations

    Journal of Construction Engineering Management

    (1994)
  • H. Faris et al.

    A comparison between parametric and non-parametric soft computing approaches to model the temperature of a metal cutting tool

    International Journal of Computer Integrated Manufacturing

    (2016)
  • M. Gharun et al.

    Short-term forecasting of water yield from forested catchments after bushfire: A case study from Southeast Australia

    Water

    (2015)
  • Cited by (10)

    • Toward predicting SO<inf>2</inf> solubility in ionic liquids utilizing soft computing approaches and equations of state

      2022, Journal of the Taiwan Institute of Chemical Engineers
      Citation Excerpt :

      The mean squared error (MSE), which indicates the chosen fitness value, is used to analyze each produced formula. Through two genetic events known as crossover and mutation, patterns with the highest fitness value are subjected to probability-based selection and reassortment [79]. Ivakhnenko [80,81] introduced the GMDH approach, which is an innovative self-organizing methodology for analyzing complex nonlinear problems [82].

    • Interference-adjusted power learning curve model with forgetting

      2022, International Journal of Industrial Ergonomics
    • Genetic programming based symbolic regression for shear capacity prediction of SFRC beams

      2021, Construction and Building Materials
      Citation Excerpt :

      Each developed expression is evaluated via the mean squared error (MSE), which represents the selected fitness function. Expressions with the best fitness value are prone to a probabilistic selection and recombination via two genetic operations named crossover and mutation [72]. As illustrated in Fig. 3, crossover corresponds to an exchange of sub-trees between a pair of expressions that recorded high fitness values.

    • Collaboration in a low-carbon supply chain with reference emission and cost learning effects: Cost sharing versus revenue sharing strategies

      2020, Journal of Cleaner Production
      Citation Excerpt :

      As time passes by, accumulated technologies and experiences in the manufacturing process enable companies to achieve improved performances and subsequently decrease manufacturing costs. This effect is described as “learning by doing” in literature and characterizes high-technology and green products (Egelman et al., 2016; Nembhard and Sun, 2019; Nemet, 2006; Pan and Li, 2016). Although the learning effect leads to cost savings, these savings hardly offset the investment of emission reduction.

    View all citing articles on Scopus
    View full text