ABSTRACT
Symbolic Regression is a powerful data-driven technique that searches for mathematical expressions that explain the relationship between input variables and a target of interest. Due to its efficiency and flexibility, Genetic Programming can be seen as the standard search technique for Symbolic Regression. However, the conventional Genetic Programming algorithm requires storing all data in a central location, which is not always feasible due to growing concerns about data privacy and security. While privacy-preserving research has advanced recently and might offer a solution to this problem, their application to Symbolic Regression remains largely unexplored. Furthermore, the existing work only focuses on the horizontally partitioned setting, whereas the vertically partitioned setting, another popular scenario, has yet to be investigated. Herein, we propose an approach that employs a privacy-preserving technique called Secure Multiparty Computation to enable parties to jointly build Symbolic Regression models in the vertical scenario without revealing private data. Preliminary experimental results indicate that our proposed method delivers comparable performance to the centralized solution while safeguarding data privacy.
- Michael Affenzeller, Stephan M Winkler, Gabriel Kronberger, Michael Kommenda, Bogdan Burlacu, and Stefan Wagner. 2014. Gaining deeper insights in symbolic regression. Genetic Programming Theory and Practice XI (2014), 175--190.Google Scholar
- Donald Beaver. 1992. Efficient multiparty protocols using circuit randomization. In Advances in Cryptology---CRYPTO'91: Proceedings 11. Springer, 420--432.Google ScholarCross Ref
- Daniel Demmler, Thomas Schneider, and Michael Zohner. 2015. ABY-A framework for efficient mixed-protocol secure two-party computation.. In NDSS.Google Scholar
- Junlan Dong, Jinghui Zhong, Wei-Neng Chen, and Jun Zhang. 2022. An Efficient Federated Genetic Programming Framework for Symbolic Regression. IEEE Transactions on Emerging Topics in Computational Intelligence (2022), 1--14. Google ScholarCross Ref
- Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3--4 (2014), 211--407.Google Scholar
- David Evans, Vladimir Kolesnikov, Mike Rosulek, et al. 2018. A pragmatic introduction to secure multi-party computation. Foundations and Trends® in Privacy and Security 2, 2--3 (2018), 70--246.Google Scholar
- Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (jul 2012), 2171--2175.Google ScholarDigital Library
- Adrià Gascón, Phillipp Schoppmann, Borja Balle, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans. 2016. Secure linear regression on vertically partitioned datasets. IACR Cryptol. ePrint Arch. 2016 (2016), 892.Google Scholar
- Craig Gentry. 2009. A fully homomorphic encryption scheme. Stanford university.Google Scholar
- Brian Knott, Shobha Venkataraman, Awni Hannun, Shubho Sengupta, Mark Ibrahim, and Laurens van der Maaten. 2021. Crypten: Secure multi-party computation meets machine learning. Advances in Neural Information Processing Systems 34 (2021), 4961--4973.Google Scholar
- Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).Google Scholar
- John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4 (1994), 87--112.Google Scholar
- William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H Moore. 2021. Contemporary symbolic regression methods and their relative performance. arXiv preprint arXiv:2107.14351 (2021).Google Scholar
- Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, and Bingsheng He. 2021. A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Transactions on Knowledge and Data Engineering (2021).Google Scholar
- Yang Liu, Yingting Liu, Zhijie Liu, Yuxuan Liang, Chuishi Meng, Junbo Zhang, and Yu Zheng. 2020. Federated forest. IEEE Transactions on Big Data 8, 3 (2020), 843--854.Google ScholarCross Ref
- Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B Kirpichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. 2017. SymPy: symbolic computing in Python. PeerJ Computer Science 3 (2017), e103.Google ScholarCross Ref
- Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE symposium on security and privacy (SP). IEEE, 19--38.Google ScholarCross Ref
- Du Nguyen Duy, David Gabauer, and Ramin Nikzad-Langerodi. 2022. Towards federated multivariate statistical process control (FedMSPC). arXiv preprint arXiv:2211.01645 (2022).Google Scholar
- Martin Pettai and Peeter Laud. 2015. Combining differential privacy and secure multiparty computation. In Proceedings of the 31st Annual Computer Security Applications Conference. 421--430.Google ScholarDigital Library
- Nguyen Quang Uy, Nguyen Xuan Hoai, Michael O'Neill, Robert I McKay, and Edgar Galván-López. 2011. Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12 (2011), 91--119.Google ScholarDigital Library
- Sameer Wagh, Divya Gupta, and Nishanth Chandran. 2019. SecureNN: 3-Party Secure Computation for Neural Network Training. Proc. Priv. Enhancing Technol. 2019, 3 (2019), 26--49.Google ScholarCross Ref
- Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--19.Google ScholarDigital Library
Recommendations
Secure Multiparty Computation: From Millionaires Problem to Anonymizer
In Secure Multiparty Computation (SMC), multiple parties perform joint computation over their private data inputs preserving the privacy of their individual inputs. This type of computation needs to provide correct result while keeping the individual ...
Three New Approaches to Privacy-preserving Add to Multiply Protocol and its Application
WKDD '09: Proceedings of the 2009 Second International Workshop on Knowledge Discovery and Data MiningPrivacy-preserving Data Mining aims at securely extracting knowledge from two or more parties' private data. Secure Multi-party Computation is the paramount approach to it. In this paper, we study Privacy-preserving Add and Multiply Exchanging ...
Secure Multi-party Protocols for Privacy Preserving Data Mining
WASA '08: Proceedings of the Third International Conference on Wireless Algorithms, Systems, and ApplicationsPeople are more and more concerned with privacy protection while performing data mining. ID3 is a very popular decision tree building method in data mining. Entropy and Gini index are two different criteria used in ID3. While there is quite some work in ...
Comments