Abstract
Genetic Programming is widely used to build predictive models for defect proneness or development efforts. The predictive modelling often depends on the use of sensitive data, related to past faults or internal resources, as training data. We envision a scenario in which revealing the training data constitutes a violation of privacy. To ensure organisational privacy in such a scenario, we propose SMCGP, a method that performs Genetic Programming as Secure Multiparty Computation. In SMCGP, one party uses GP to learn a model of training data provided by another party, without actually knowing each datapoint in the training data. We present an SMCGP approach based on the garbled circuit protocol, which is evaluated using two problem sets: a widely studied symbolic regression benchmark, and a GP-based fault localisation technique with real world fault data from Defects4J benchmark. The results suggest that SMCGP can be equally accurate as the normal GP, but the cost of keeping the training data hidden can be about three orders of magnitude slower execution.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
It is available from https://oblivc.org.
- 2.
In practice, our implementation gathers all candidate solutions in a generation and combines them all into a single Obliv-C program, to save the compilation overhead. This is similar to the approach taken by existing GPGPU based parallelisation approach for GP [14].
- 3.
- 4.
Note that, while FLUCCS [27] makes a link between defect prediction and fault localisation via shared features, the GP formulations for two problems are different. Defect prediction classifies each program element to be fault prone or not: fault localisation assigns suspiciousness scores to program elements, aiming to place the faulty element at the top when ranked by them.
- 5.
It is known that faults exhibit modal behaviours against fault localisation ranking models learnt by FLUCCS [27]: Mockito-1 may be one such a fault that can only be localised well by a small minority of ranking models.
References
Anati, I., Gueron, S., Johnson, S., Scarlata, V.: Innovative technology for CPU based attestation and sealing. In: Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy, vol. 13 (2013)
Balcan, M., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. In: COLT 2012 - The 25th Annual Conference on Learning Theory, pp. 26.1–26.22 (2012)
Baumann, A., Peinado, M., Hunt, G.: Shielding applications from an untrusted cloud with haven. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation OSDI 2014, pp. 267–283. USENIX Association, Berkeley, CA, USA (2014)
Du, W., Atallah, M.J.: Secure multi-party computation problems and their applications: a review and open problems. In: Proceedings of the 2001 Workshop on New Security Paradigms, pp. 13–22. ACM (2001)
Even, S., Goldreich, O., Lempel, A.: A randomized protocol for signing contracts. Commun. ACM 28(6), 637–647 (1985)
Ferrucci, F., Gravino, C., Oliveto, R., Sarro, F.: Genetic programming for effort estimation: an analysis of the impact of different fitness functions. In: 2010 Second International Symposium on Search Based Software Engineering (SSBSE), pp. 89–98. IEEE (2010)
Forrest, S., Nguyen, T., Weimer, W., Le Goues, C.: A genetic programming approach to automated software repair. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation GECCO 2009, pp. 947–954. ACM (2009)
Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
Gascón, A., et al.: Privacy-preserving distributed linear regression on high-dimensional data. In: Proceedings on Privacy Enhancing Technologies PPET 2017, vol. 4, pp. 345–364 (2017)
Gupta, T., Fingler, H., Alvisi, L., Walfish, M.: Pretzel: email encryption and provider-supplied functions are compatible. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, Los Angeles, CA, USA, 21–25 August 2017, pp. 169–182 (2017)
Just, R., Jalali, D., Ernst, M.D.: Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis ISSTA 2014, pp. 437–440. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2610384.2628055
Kang, D., Sohn, J., Yoo, S.: Empirical evaluation of conditional operators in GP based fault localization. In: Genetic and Evolutionary Computation GECCO 2017, pp. 1295–1302 (2017)
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 70–82. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36599-0_7
Kim, J., Kim, J., Yoo, S.: GPGPGPU: evaluation of parallelisation of genetic programming using GPGPU. In: Menzies, T., Petke, J. (eds.) SSBSE 2017. LNCS, vol. 10452, pp. 137–142. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66299-2_11
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Li, Z., Jing, X.Y., Zhu, X., Zhang, H., Xu, B., Ying, S.: On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans. Softw. Eng. 1 (2017)
Liu, Y., Khoshgoftaar, T.M.: Genetic programming model for software quality classification. In: Proceedings 6th International Symposium on High Assurance Systems Engineering, Special Topic: Impact of Networking, pp. 127–136 (2001)
Maua, G., Galinac Grbac, T.: Co-evolutionary multi-population genetic programming for classification in software defect prediction. Appl. Soft Comput. 55(C), 331–351 (2017)
McKeen, F., et al.: Innovative instructions and software model for isolated execution. In: Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy HASP 2013, p. 10:1. ACM, New York, NY, USA (2013)
Moore, C., O’Neill, M., O’Sullivan, E., Doröz, Y., Sunar, B.: Practical homomorphic encryption: a survey. In: IEEE International Symposium on Circuits and Systems ISCAS 2014, pp. 2792–2795, June 2014
Peters, F., Menzies, T., Gong, L., Zhang, H.: Balancing privacy and utility in cross-company defect prediction. IEEE Trans. Softw. Eng. 39(8), 1054–1068 (2013)
Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via http://lulu.com, http://www.gp-field-guide.org.uk (2008)
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Sarwate, A.D., Chaudhuri, K.: Signal processing and machine learning with differential privacy: algorithms and challenges for continuous data. IEEE Signal Process. 30(5), 86–94 (2013)
Schuster, F., et al.: VC3: trustworthy data analytics in the cloud using SGX. In: 2015 IEEE Symposium on Security and Privacy, pp. 38–54, May 2015
Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)
Sohn, J., Yoo, S.: FLUCCS: using code and change metrics to improve fault localisation. In: Proceedings of the International Symposium on Software Testing and Analysis ISSTA 2017, pp. 273–283. ACM, July 2017
Songhori, E.M., Hussain, S.U., Sadeghi, A.R., Schneider, T., Koushanfar, F.: Tinygarble: highly compressed and scalable sequential garbled circuits. In: IEEE Symposium on Security and Privacy SSP 2015, pp. 411–428, May 2015
Tian, L., Jayaraman, B., Gu, Q., Evans, D.: Aggregating private sparse learning models using multi-party computation. In: NIPS Workshop on Private Multi-Party Machine Learning, PMPML 2016 (2016)
Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)
Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2009)
Weimer, W., Nguyen, T., Goues, C.L., Forrest, S.: Automatically finding patches using genetic programming. In: Proceedings of the 31st IEEE International Conference on Software Engineering ICSE 2009, pp. 364–374. IEEE, May 2009
White, D.R., et al.: Better GP benchmarks: community survey results and proposals. Genet. Program. Evolvable Mach. 14(1), 3–29 (2013)
Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw.Eng. 42(8), 707 (2016)
Yao, A.C.C.: How to generate and exchange secrets. In: Proceedings of the 27th Annual Symposium on Foundations of Computer Science SFCS 1986, pp. 162–167. IEEE Computer Society, Washington, DC, USA (1986)
Yoo, S.: Evolving human competitive spectra-based fault localisation techniques. In: Fraser, G., Teixeira de Souza, J. (eds.) SSBSE 2012. LNCS, vol. 7515, pp. 244–258. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33119-0_18
Zahur, S., Evans, D.: Obliv-C: a language for extensible data-oblivious computation. IACR Cryptol. ePrint Arch. 2015, 1153 (2015)
Acknowledgement
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (Grant No. NRF-2016R1C1B1011042).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, J., Epitropakis, M.G., Yoo, S. (2018). Learning Without Peeking: Secure Multi-party Computation Genetic Programming. In: Colanzi, T., McMinn, P. (eds) Search-Based Software Engineering. SSBSE 2018. Lecture Notes in Computer Science(), vol 11036. Springer, Cham. https://doi.org/10.1007/978-3-319-99241-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-99241-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99240-2
Online ISBN: 978-3-319-99241-9
eBook Packages: Computer ScienceComputer Science (R0)