Multi- and Many-Threaded Heterogeneous Parallel Grammatical Evolution

Dufek, Amanda Sabatini; Augusto, Douglas Adriano; Barbosa, Helio José Corrêa; da Silva Dias, Pedro Leite

doi:10.1007/978-3-319-78717-6_9

Amanda Sabatini Dufek⁴,
Douglas Adriano Augusto⁵,
Helio José Corrêa Barbosa⁴ &
…
Pedro Leite da Silva Dias⁶

760 Accesses

Abstract

There are some algorithms suited for inference of human-interpretable models for classification and regression tasks in machine learning, but it is hard to compete with Grammatical Evolution (GE) when it comes to powerfulness, model expressiveness and ease of implementation. On the other hand, algorithms that iteratively optimize a set of programs of arbitrary complexity—which is the case of GE—may take an inconceivable amount of running time when tackling complex problems. Fortunately, GE may scale to such problems by carefully harnessing the parallel processing of modern heterogeneous systems, taking advantage of traditional multi-core processors and many-core accelerators to speed up the execution by orders of magnitude. This chapter covers the subject of parallel GE, focusing on heterogeneous multi- and many-threaded decomposition in order to achieve a fully parallel implementation, where both the breeding and evaluation are parallelized. In the studied benchmarks, the overall parallel implementation runtime was 68 times faster than the sequential version, with the program evaluation kernel alone hitting an acceleration of 350 times. Details on how to efficiently accomplish that are given in the context of two well-established open standards for parallel computing: OpenMP and OpenCL. Decomposition strategies, optimization techniques and parallel benchmarks followed by analyses are presented in the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Put simply, cache is a very fast but small memory that stores the most recently/commonly accessed data. When a requested data is not in cache, a fixed size block of data—which contains not only the requested data, but also the nearby ones—is copied from main memory into the cache, which is known as cache line or cache block [10].
2.
Due to accelerator device constraints, the OpenCL programming model forbids synchronization among work-groups which precludes gathering the partial errors within the evaluation kernel. Since only a fraction of partial errors remains at the end of the kernel execution, it is preferred to reduce them directly on the host processor (CPU).
3.
ppi is freely available at http://github.com/daaugusto/ppi.
4.
Note that a single generation would suffice to estimate the execution time of the tasks of the GE algorithm, however, we used two generations in order to make the timeline visualization more appealing.
5.
Speedup is the ratio between the execution time of the algorithm before and after improvement.
6.
Improvement percentage is the difference between the execution time of the algorithm before and after improvement over the execution time of the unimproved version, and multiplying by 100.
7.
GPop/s stands for genetic programming operations per second and is equivalent to the number of program “symbols” processed per second.
8.
SIMD stands for Single Instruction Multiple Data and means that a single instruction is executed on multiple data simultaneously.
9.
SPMD stands for Single Program Multiple Data and means that different compute units of a compute device are able to execute different instructions simultaneously from the same kernel.

References

D.A. Augusto, H.J.C. Barbosa, Symbolic regression via genetic programming, in Proceedings. Vol. 1 Sixth Brazilian Symposium on Neural Networks (2000), pp. 173–178
Google Scholar
D.A. Augusto, H.J.C. Barbosa, Accelerated parallel genetic programming tree evaluation with OpenCL. J. Parallel Distrib. Comput. 73(1), 86–100 (2013)
Article Google Scholar
W.J. Bolosky, M.L. Scott, False sharing and its effect on shared memory performance, in USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4. Sedms’93, Berkeley, CA (USENIX Association, Berkeley, CA, 1993), pp. 3–3
Google Scholar
R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, R. Menon, Parallel Programming in OpenMP (Morgan Kaufmann, San Francisco, CA, 2001)
Google Scholar
L. Deng, D. Yu, Deep learning: methods and applications. Technical Report (2014)
Google Scholar
S.S. Haykin, Neural Networks and Learning Machines, 3rd edn. (Pearson Education, Upper Saddle River, NJ, 2009)
Google Scholar
W.D. Hillis, G.L. Steele Jr., Data parallel algorithms. Commun. ACM 29, 1170–1183 (1986)
Article Google Scholar
C. Iancu, S. Hofmeyr, F. Blagojević, Y. Zheng, Oversubscription on multicore processors, in 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS) (April 2010), pp. 1–11
Google Scholar
P. Jääskeläinen, C.S. de La Lama, E. Schnetter, K. Raiskila, J. Takala, H. Berg, pocl: A performance-portable OpenCL implementation. Int. J. Parallel Program. 43(5), 752–785 (2015)
Google Scholar
B. Jacob, S. Ng, D. Wang, Memory Systems: Cache, DRAM, Disk (Morgan Kaufmann, San Francisco, CA, 2007)
Google Scholar
D. Kaeli, P. Mistry, D. Schaa, D.P. Zhang (eds.), Heterogeneous Computing with OpenCL 2.0, 3rd edn. (Morgan Kaufmann, Boston, 2015)
Google Scholar
C. Lameter, Numa (non-uniform memory access): an overview. Queue 11(7), 40:40–40:51 (2013)
Google Scholar
B.W. Lampson, Lazy and speculative execution in computer systems, in OPODIS, ed. by A. A. Shvartsman, vol. 4305. Lecture Notes in Computer Science, pp. 1–2 (Springer, Berlin, 2006)
Google Scholar
P. Pospichal, E. Murphy, M. O’Neill, J. Schwarz, J. Jaros, Acceleration of grammatical evolution using graphics processing units: computational intelligence on consumer games and graphics hardware, in Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’11, New York, NY (ACM, New York, 2011), pp. 431–438
Google Scholar
D. Robilliard, V. Marion-Poty, C. Fonlupt, Genetic programming on graphics processing units. Genet. Program Evolvable Mach. 10(4), 447 (2009)
Google Scholar
I.L.S. Russo, H.S. Bernardino, H.J.C. Barbosa, A massively parallel grammatical evolution technique with OpenCL. J. Parallel Distrib. Comput. 109, 333–349 (2017)
Article Google Scholar
J. Shen, J. Fang, H.J. Sips, A.L. Varbanescu, Performance traps in OpenCL for CPUs, in PDP (IEEE Computer Society, 2013), pp. 38–45
Google Scholar
J.E. Smith, A study of branch prediction strategies, in Proceedings of the 8th Annual Symposium on Computer Architecture, ISCA ’81, Los Alamitos, CA (IEEE Computer Society Press, Los Alamitos, CA, 1981), pp. 135–148
Google Scholar

Download references

Acknowledgements

The authors would like to thank the support provided by CNPq (grants 310778/2013-1, 502836/2014-8 and 300458/2017-7), FAPEMIG (grant APQ-03414-15), EU H2020 Programme and MCTI/RNP–Brazil under the HPC4E Project (grant 689772).

Author information

Authors and Affiliations

National Laboratory for Scientific Computing, Rio de Janeiro, Brazil
Amanda Sabatini Dufek & Helio José Corrêa Barbosa
Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
Douglas Adriano Augusto
Institute of Astronomy, Geophysics and Atmospheric Sciences, University of São Paulo, São Paulo, Brazil
Pedro Leite da Silva Dias

Authors

Amanda Sabatini Dufek
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Adriano Augusto
View author publications
You can also search for this author in PubMed Google Scholar
Helio José Corrêa Barbosa
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Leite da Silva Dias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amanda Sabatini Dufek .

Editor information

Editors and Affiliations

Department of Computer Science and Information Systems, University of Limerick, Castletroy, Limerick, Ireland
Conor Ryan
School of Business, University College Dublin, Dublin, Ireland
Michael O'Neill
Department of Computer Science and Information Systems, University of Limerick, Castletroy, Limerick, Ireland
JJ Collins

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dufek, A.S., Augusto, D.A., Barbosa, H.J.C., da Silva Dias, P.L. (2018). Multi- and Many-Threaded Heterogeneous Parallel Grammatical Evolution. In: Ryan, C., O'Neill, M., Collins, J. (eds) Handbook of Grammatical Evolution. Springer, Cham. https://doi.org/10.1007/978-3-319-78717-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-78717-6_9
Published: 12 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78716-9
Online ISBN: 978-3-319-78717-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics