Skip to main content

Learning Without Peeking: Secure Multi-party Computation Genetic Programming

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11036))

Abstract

Genetic Programming is widely used to build predictive models for defect proneness or development efforts. The predictive modelling often depends on the use of sensitive data, related to past faults or internal resources, as training data. We envision a scenario in which revealing the training data constitutes a violation of privacy. To ensure organisational privacy in such a scenario, we propose SMCGP, a method that performs Genetic Programming as Secure Multiparty Computation. In SMCGP, one party uses GP to learn a model of training data provided by another party, without actually knowing each datapoint in the training data. We present an SMCGP approach based on the garbled circuit protocol, which is evaluated using two problem sets: a widely studied symbolic regression benchmark, and a GP-based fault localisation technique with real world fault data from Defects4J benchmark. The results suggest that SMCGP can be equally accurate as the normal GP, but the cost of keeping the training data hidden can be about three orders of magnitude slower execution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    It is available from https://oblivc.org.

  2. 2.

    In practice, our implementation gathers all candidate solutions in a generation and combines them all into a single Obliv-C program, to save the compilation overhead. This is similar to the approach taken by existing GPGPU based parallelisation approach for GP [14].

  3. 3.

    http://dces.essex.ac.uk/research/evostar/competitions.html.

  4. 4.

    Note that, while FLUCCS [27] makes a link between defect prediction and fault localisation via shared features, the GP formulations for two problems are different. Defect prediction classifies each program element to be fault prone or not: fault localisation assigns suspiciousness scores to program elements, aiming to place the faulty element at the top when ranked by them.

  5. 5.

    It is known that faults exhibit modal behaviours against fault localisation ranking models learnt by FLUCCS [27]: Mockito-1 may be one such a fault that can only be localised well by a small minority of ranking models.

References

  1. Anati, I., Gueron, S., Johnson, S., Scarlata, V.: Innovative technology for CPU based attestation and sealing. In: Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy, vol. 13 (2013)

    Google Scholar 

  2. Balcan, M., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. In: COLT 2012 - The 25th Annual Conference on Learning Theory, pp. 26.1–26.22 (2012)

    Google Scholar 

  3. Baumann, A., Peinado, M., Hunt, G.: Shielding applications from an untrusted cloud with haven. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation OSDI 2014, pp. 267–283. USENIX Association, Berkeley, CA, USA (2014)

    Google Scholar 

  4. Du, W., Atallah, M.J.: Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 Workshop on New Security Paradigms, pp. 13–22. ACM (2001)

    Google Scholar 

  5. Even, S., Goldreich, O., Lempel, A.: A randomized protocol for signing contracts. Commun. ACM 28(6), 637–647 (1985)

    Article  MathSciNet  Google Scholar 

  6. Ferrucci, F., Gravino, C., Oliveto, R., Sarro, F.: Genetic programming for effort estimation: an analysis of the impact of different fitness functions. In: 2010 Second International Symposium on Search Based Software Engineering (SSBSE), pp. 89–98. IEEE (2010)

    Google Scholar 

  7. Forrest, S., Nguyen, T., Weimer, W., Le Goues, C.: A genetic programming approach to automated software repair. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation GECCO 2009, pp. 947–954. ACM (2009)

    Google Scholar 

  8. Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  MATH  Google Scholar 

  9. Gascón, A., et al.: Privacy-preserving distributed linear regression on high-dimensional data. In: Proceedings on Privacy Enhancing Technologies PPET 2017, vol. 4, pp. 345–364 (2017)

    Google Scholar 

  10. Gupta, T., Fingler, H., Alvisi, L., Walfish, M.: Pretzel: email encryption and provider-supplied functions are compatible. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, Los Angeles, CA, USA, 21–25 August 2017, pp. 169–182 (2017)

    Google Scholar 

  11. Just, R., Jalali, D., Ernst, M.D.: Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis ISSTA 2014, pp. 437–440. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2610384.2628055

  12. Kang, D., Sohn, J., Yoo, S.: Empirical evaluation of conditional operators in GP based fault localization. In: Genetic and Evolutionary Computation GECCO 2017, pp. 1295–1302 (2017)

    Google Scholar 

  13. Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 70–82. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36599-0_7

    Chapter  Google Scholar 

  14. Kim, J., Kim, J., Yoo, S.: GPGPGPU: evaluation of parallelisation of genetic programming using GPGPU. In: Menzies, T., Petke, J. (eds.) SSBSE 2017. LNCS, vol. 10452, pp. 137–142. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66299-2_11

    Chapter  Google Scholar 

  15. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  16. Li, Z., Jing, X.Y., Zhu, X., Zhang, H., Xu, B., Ying, S.: On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans. Softw. Eng. 1 (2017)

    Google Scholar 

  17. Liu, Y., Khoshgoftaar, T.M.: Genetic programming model for software quality classification. In: Proceedings 6th International Symposium on High Assurance Systems Engineering, Special Topic: Impact of Networking, pp. 127–136 (2001)

    Google Scholar 

  18. Maua, G., Galinac Grbac, T.: Co-evolutionary multi-population genetic programming for classification in software defect prediction. Appl. Soft Comput. 55(C), 331–351 (2017)

    Google Scholar 

  19. McKeen, F., et al.: Innovative instructions and software model for isolated execution. In: Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy HASP 2013, p. 10:1. ACM, New York, NY, USA (2013)

    Google Scholar 

  20. Moore, C., O’Neill, M., O’Sullivan, E., Doröz, Y., Sunar, B.: Practical homomorphic encryption: a survey. In: IEEE International Symposium on Circuits and Systems ISCAS 2014, pp. 2792–2795, June 2014

    Google Scholar 

  21. Peters, F., Menzies, T., Gong, L., Zhang, H.: Balancing privacy and utility in cross-company defect prediction. IEEE Trans. Softw. Eng. 39(8), 1054–1068 (2013)

    Article  Google Scholar 

  22. Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via http://lulu.com, http://www.gp-field-guide.org.uk (2008)

  23. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  Google Scholar 

  24. Sarwate, A.D., Chaudhuri, K.: Signal processing and machine learning with differential privacy: algorithms and challenges for continuous data. IEEE Signal Process. 30(5), 86–94 (2013)

    Article  Google Scholar 

  25. Schuster, F., et al.: VC3: trustworthy data analytics in the cloud using SGX. In: 2015 IEEE Symposium on Security and Privacy, pp. 38–54, May 2015

    Google Scholar 

  26. Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)

    Article  MathSciNet  Google Scholar 

  27. Sohn, J., Yoo, S.: FLUCCS: using code and change metrics to improve fault localisation. In: Proceedings of the International Symposium on Software Testing and Analysis ISSTA 2017, pp. 273–283. ACM, July 2017

    Google Scholar 

  28. Songhori, E.M., Hussain, S.U., Sadeghi, A.R., Schneider, T., Koushanfar, F.: Tinygarble: highly compressed and scalable sequential garbled circuits. In: IEEE Symposium on Security and Privacy SSP 2015, pp. 411–428, May 2015

    Google Scholar 

  29. Tian, L., Jayaraman, B., Gu, Q., Evans, D.: Aggregating private sparse learning models using multi-party computation. In: NIPS Workshop on Private Multi-Party Machine Learning, PMPML 2016 (2016)

    Google Scholar 

  30. Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)

    Article  Google Scholar 

  31. Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2009)

    Article  Google Scholar 

  32. Weimer, W., Nguyen, T., Goues, C.L., Forrest, S.: Automatically finding patches using genetic programming. In: Proceedings of the 31st IEEE International Conference on Software Engineering ICSE 2009, pp. 364–374. IEEE, May 2009

    Google Scholar 

  33. White, D.R., et al.: Better GP benchmarks: community survey results and proposals. Genet. Program. Evolvable Mach. 14(1), 3–29 (2013)

    Article  Google Scholar 

  34. Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw.Eng. 42(8), 707 (2016)

    Article  Google Scholar 

  35. Yao, A.C.C.: How to generate and exchange secrets. In: Proceedings of the 27th Annual Symposium on Foundations of Computer Science SFCS 1986, pp. 162–167. IEEE Computer Society, Washington, DC, USA (1986)

    Google Scholar 

  36. Yoo, S.: Evolving human competitive spectra-based fault localisation techniques. In: Fraser, G., Teixeira de Souza, J. (eds.) SSBSE 2012. LNCS, vol. 7515, pp. 244–258. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33119-0_18

    Chapter  Google Scholar 

  37. Zahur, S., Evans, D.: Obliv-C: a language for extensible data-oblivious computation. IACR Cryptol. ePrint Arch. 2015, 1153 (2015)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (Grant No. NRF-2016R1C1B1011042).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jinhan Kim or Shin Yoo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, J., Epitropakis, M.G., Yoo, S. (2018). Learning Without Peeking: Secure Multi-party Computation Genetic Programming. In: Colanzi, T., McMinn, P. (eds) Search-Based Software Engineering. SSBSE 2018. Lecture Notes in Computer Science(), vol 11036. Springer, Cham. https://doi.org/10.1007/978-3-319-99241-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99241-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99240-2

  • Online ISBN: 978-3-319-99241-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics