Skip to main content

Evolving GP Classifiers for Streaming Data Tasks with Concept Change and Label Budgets: A Benchmarking Study

  • Chapter
Book cover Handbook of Genetic Programming Applications

Abstract

Streaming data classification requires that several additional challenges are addressed that are not typically encountered in offline supervised learning formulations. Specifically, access to data at any training generation is limited to a small subset of the data, and the data itself is potentially generated by a non-stationary process. Moreover, there is a cost to requesting labels, thus a label budget is enforced. Finally, an anytime classification requirement implies that it must be possible to identify a ‘champion’ classifier for predicting labels as the stream progresses. In this work, we propose a general framework for deploying genetic programming (GP) to streaming data classification under these constraints. The framework consists of a sampling policy and an archiving policy that enforce criteria for selecting data to appear in a data subset. Only the exemplars of the data subset are labeled, and it is the content of the data subset that training epochs are performed against. Specific recommendations include support for GP task decomposition/modularity and making additional training epochs per data subset. Both recommendations make significant improvements to the baseline performance of GP under streaming data with label budgets. Benchmarking issues addressed include the identification of datasets and performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Publicly available at http://web.cs.dal.ca/~mheywood/Code/SBB/Stream/StreamData.html.

  2. 2.

    Gabor Melli. The ‘datgen’ Dataset Generator. http://www.datsetgenerator.com/.

  3. 3.

    MOA prerelease 2014.03; http://moa.cms.waikato.ac.nz/overview/.

  4. 4.

    Implying that 10 % of the label budget is consumed in pre-training.

  5. 5.

    Electricity demand and forest cover type datasets observed similar effects and therefore results are not explicitly reported.

  6. 6.

    Similar effects being observed for the electricity and forest cover type datasets.

  7. 7.

    Other than the monolithic formulation of SBB being subject to the constraint that only one program may represent each class, the two implementations are the same.

  8. 8.

    Results not shown for brevity.

References

  • A. Atwater and M. I. Heywood. Benchmarking Pareto archiving heuristics in the presence of concept drift: Diversity versus age. In ACM Genetic and Evolutionary Computation Conference, pages 885–892, 2013.

    Google Scholar 

  • A. Atwater, M. I. Heywood, and A. N. Zincir-Heywood. GP under streaming data constraints: A case for Pareto archiving? In ACM Genetic and Evolutionary Computation Conference, pages 703–710, 2012.

    Book  Google Scholar 

  • K. Bache and M. Lichman. UCI machine learning repository, 2013.

    Google Scholar 

  • M. Behdad and T. French. Online learning classifiers in dynamic environments with incomplete feedback. In IEEE Congress on Evolutionary Computation, pages 1786–1793, 2013.

    Google Scholar 

  • A. Bifet. Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams, volume 207 of Frontiers in Artificial Intelligence and Applications. IOS Press, 2010.

    Google Scholar 

  • A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In SIAM International Conference on Data Mining, pages 443–448, 2007.

    Google Scholar 

  • A. Bifet, I. Z̆liobaitė, B. Pfahringer, and G. Holmes. Pitfalls in benchmarking data stream classification and how to avoid them. In Machine Learning and Knowledge Discovery in Databases, volume 8188 of LNCS, pages 465–479, 2013.

    Google Scholar 

  • T. Blackwell and J. Branke. Multiswarms, exclusion, and anti-convergence in dynamic environments. IEEE Transactions on Evolutionary Computation, 10(4):459–472, 2006.

    Article  Google Scholar 

  • G. Brown and L. I. Kuncheva. “Good” and “bad” diversity in majority vote ensembles. In Multiple Classifier Systems, volume 5997 of LNCS, pages 124–133, 2010.

    Google Scholar 

  • T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An information-theoretic approach to detecting changes in multi-dimensional data streams. In Proceedings of the Symposium on the Interface of Statistics, 2006.

    Google Scholar 

  • A. P. Dawid. Statistical theory: The prequential approach. Journal of the Royal Statistical Society-A, 147:278–292, 1984.

    Article  MATH  MathSciNet  Google Scholar 

  • E. D. de Jong. A monotonic archive for pareto-coevolution. Evolutionary Computation, 15(1):61–94, 2007.

    Article  Google Scholar 

  • I. Dempsey, M. O’Neill, and A. Brabazon. Foundations in Grammatical Evolution for Dynamic Environments, volume 194 of Studies in Computational Intelligence. Springer, 2009.

    Google Scholar 

  • G. Ditzler and R. Polikar. Hellinger distance based drift detection for non-stationary environments. In IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments, pages 41–48, 2011.

    Google Scholar 

  • J. A. Doucette, P. Lichodzijewski, and M. I. Heywood. Hierarchical task decomposition through symbiosis in reinforcement learning. In ACM Genetic and Evolutionary Computation Conference, pages 97–104, 2012a.

    Google Scholar 

  • J. A. Doucette, A. R. McIntyre, P. Lichodzijewski, and M. I. Heywood. Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces. Genetic Programming and Evolvable Machines, 13(1), 2012b.

    Google Scholar 

  • W. Fan, Y. Huang, H. Wang, and P. S. Yu. Active mining of data streams. In Proceedings of SIAM International Conference on Data Mining, pages 457–461, 2004.

    Google Scholar 

  • G. Folino and G. Papuzzo. Handling different categories of concept drift in data streams using distributed GP. In European Conference on Genetic Programming, volume 6021 of LNCS, pages 74–85, 2010.

    Google Scholar 

  • J. Gama. Knowledge discovery from data streams. CRC Press, 2010.

    Google Scholar 

  • J. Gama. A survey on learning from data streams: Current and future trends. Progress in Artificial Intelligence, 1(1):45–55, 2012.

    Article  Google Scholar 

  • J. Gama, P. Medas, G. Castillo, and P. P. Rodrigues. Learning with drift detection. In Advances in Artificial Intelligence, volume 3171 of LNCS, pages 66–112, 2004.

    Google Scholar 

  • J. Gama, R. Sebastião, and P. Rodrigues. On evaluating stream learning algorithms. Machine Learning, 90(3):317–346, 2013.

    Article  MATH  MathSciNet  Google Scholar 

  • M. Harries. Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South Wales, 1999.

    Google Scholar 

  • M. I. Heywood. Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genetic Programming and Evolvable Machines, 2015. DOI 10.1007/s10710-014-9236-y.

  • S. Huang and Y. Dong. An active learning system for mining time changing data streams. Intelligent Data Analysis, 11(4):401–419, 2007.

    Google Scholar 

  • N. Kashtan, E. Noor, and U. Alon. Varying environments can speed up evolution. Proceedings of the National Academy of Sciences, 104(34):13713–13716, 2007.

    Article  Google Scholar 

  • D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In Proceedings of the International Conference on Very Large Data Bases, pages 180–191. Morgan Kaufmann, 2004.

    Google Scholar 

  • C. Lanquillon. Information filtering in changing domains. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 41–48, 1999.

    Google Scholar 

  • P. Lichodzijewski and M. I. Heywood. Managing team-based problem solving with Symbiotic Bid-based Genetic Programming. In ACM Genetic and Evolutionary Computation Conference, pages 363–370, 2008.

    Google Scholar 

  • P. Lichodzijewski and M. I. Heywood. Symbiosis, complexification and simplicity under GP. In ACM Genetic and Evolutionary Computation Conference, pages 853–860, 2010.

    Google Scholar 

  • P. Lindstrom, B. MacNamee, and S. J. Delany. Handling concept drift in a text data stream constrained by high labelling cost. In Proceedings of the International Florida Artificial Intelligence Research Society Conference. AAAI, 2010.

    Google Scholar 

  • P. Lindstrom, B. MacNamee, and S. J. Delany. Drift detection using uncertainty distribution divergence. Evolutionary Intelligence, 4(1):13–25, 2013.

    Google Scholar 

  • L. L. Minku, A. P. White, and X. Yao. The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering, 22(5):730–742, 2010.

    Article  Google Scholar 

  • M. Parter, N. Kashtan, and U. Alon. Facilitated variation: How evolution learns from past environments to generalize to new environments. PLoS Computational Biology, 4(11):e1000206, 2008.

    Google Scholar 

  • J. Quinonero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, editors. Dataset shift in machine learning. MIT Press, 2009.

    Google Scholar 

  • R. Sebastio and J. Gama. Change detection in learning histograms from data streams. In Proceedings of the Portuguese Conference on Artificial Intelligence, volume 4874 of LNCS, pages 112–123. Springer, 2007.

    Google Scholar 

  • R. Stapenhurst and G. Brown. Theoretical and empirical analysis of diversity in non-stationary learning. In IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments, pages 25–32, 2011.

    Google Scholar 

  • I. Z̆liobaitė, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with evolving streaming data. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, pages 597–612. Springer, 2011.

    Google Scholar 

  • I. Z̆liobaitė, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, 25(1):27–54, 2014.

    Google Scholar 

  • A. Vahdat, A. Atwater, A. R. McIntyre, and M. I. Heywood. On the application of GP to streaming data classification tasks with label budgets. In ACM Genetic and Evolutionary Computation Conference: ECBDL Workshop, pages 1287–1294, 2014.

    Google Scholar 

  • A. Vahdat, J. Morgan, A. R. McIntyre, M. I. Heywood, and A. N. Zincir-Heywood. Tapped delay lines for GP streaming data classification with label budgets. In European Conference on Genetic Programming, volume 9025 of LNCS. Springer, 2015.

    Google Scholar 

  • P. Vorburger and A. Bernstein. Entropy-based concept shift detection. In Proceedings of the Sixth International Conference on Data Mining, pages 1113–1118, 2006.

    Google Scholar 

  • G. P. Wagner and L. Altenberg. Complex adaptations and the evolution of evolvability. Complexity, 50(3):433–452, 1996.

    Google Scholar 

  • X. Zhu, P. Zhang, X. Lin, and Y. Shi. Active learning from stream data using optimal weight classifier ensemble. IEEE Transactions on Systems, Man, and Cybernetics – Part B, 40(6):1607–1621, 2010.

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge funding provided by the NSERC CRD grant program (Canada).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Vahdat, A., Morgan, J., McIntyre, A.R., Heywood, M.I., Zincir-Heywood, N. (2015). Evolving GP Classifiers for Streaming Data Tasks with Concept Change and Label Budgets: A Benchmarking Study. In: Gandomi, A., Alavi, A., Ryan, C. (eds) Handbook of Genetic Programming Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-20883-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20883-1_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20882-4

  • Online ISBN: 978-3-319-20883-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics