Robust password security: a genetic programming approach with imbalanced dataset handling

Andelić, Nikola; Baressi S̆egota, Sandi; Car, Zlatan

doi:10.1007/s10207-024-00814-2

Robust password security: a genetic programming approach with imbalanced dataset handling

Regular Contribution
Published: 07 February 2024

(2024)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Nikola Andelić¹,
Sandi Baressi S̆egota¹ &
Zlatan Car¹

157 Accesses
Explore all metrics

Abstract

Developing a method for determining password strength using artificial intelligence (AI) is crucial as it enhances cybersecurity by providing a more robust defense against unauthorized access. AI can analyze complex patterns and trends, allowing for the identification of weak passwords and potential vulnerabilities more effectively than traditional methods. This proactive approach helps users and organizations strengthen their security posture, reducing the risk of data breaches and unauthorized intrusions. In this paper, the genetic programming symbolic classifier (GPSC) was applied to the publicly available dataset to obtain a set of symbolic expressions for password strength classification with high classification accuracy. One of the problems with the dataset was an imbalance between classes so various oversampling/undersampling techniques have been utilized. The optimal GPSC hyperparameter values were found using the random hyperparameter value search method. The algorithm was trained using fivefold cross-validation (5FCV). One of the problems with the dataset was an imbalance between classes so various oversampling/undersampling techniques have been utilized. To evaluate obtained SEs, the evaluation metric accuracy, area under receiver operating characteristics curve, precision, recall, and f1-score were used. The obtained SEs on balanced dataset variations achieved high classification accuracy (0.99) and with the application of all SEs on the entire original imbalanced dataset achieved the same accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PassMon: A Technique for Password Generation and Strength Estimation

Article 17 October 2021

One-Class Classification of Low Volume DoS Attacks with Genetic Programming

The Revenge of Password Crackers: Automated Training of Password Cracking Tools

References

Dell’Amico, M., Michiardi, P., Roudier, Y. Password strength: an empirical analysis. In: 2010 Proceedings IEEE INFOCOM, pp. 1–9. IEEE (2010)
Yan, J., Blackwell, A., Anderson, R., Grant, A.: Password memorability and security: empirical results. IEEE Secur. Priv. 2(5), 25–31 (2004)
Article Google Scholar
Jarecki, S., Krawczyk, H., Shirvanian, M., Saxena, N. Two-factor authentication with end-to-end password security. In: Public-Key Cryptography–PKC 2018: 21st IACR International Conference on Practice and Theory of Public-Key Cryptography, Rio de Janeiro, Brazil, March 25-29, 2018, Proceedings, Part II 21, pp. 431–461. Springer (2018)
O’Gorman, L.: Comparing passwords, tokens, and biometrics for user authentication. Proc. IEEE 91(12), 2021–2040 (2003)
Article Google Scholar
Cipresso, P., Gaggioli, A., Serino, S., Cipresso, S., Riva, G.: How to create memorizable and strong passwords. J. Med. Internet Res. 14(1), e10 (2012)
Article PubMed PubMed Central Google Scholar
Vijaya, M.S., Jamuna, K.S., Karpagavalli, S. Password strength prediction using supervised machine learning techniques. In: 2009 international conference on advances in computing, control, and telecommunication technologies, pp. 401–405. IEEE (2009)
Darbutaitė, E., Stefanovič, P., Ramanauskaitė, S.: Machine-learning-based password-strength-estimation approach for passwords of Lithuanian context. Appl. Sci. 13(13), 7811 (2023)
Article Google Scholar
Jun Kim, S., Mun Lee, B., et al.: Multi-class classification prediction model for password strength based on deep learning. J. Multimed. Inf. Syst. 10(1), 45–52 (2023)
Article Google Scholar
Bhavik Bansal. Password strength classifier dataset, Jun (2019)
Josuamarcelc. Josuamarcelc/common-password-list: common password list (rockyou.txt) built-in kali linux wordlist rockyou.txt
Ji, S., Yang, S., Wang, T., Liu, C., Lee, W.-H., Beyah, R. Pars: a uniform and open-source password analysis and research system. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp. 321–330 (2015)
Sedgwick, P.: Pearson’s correlation coefficient. Bmj 345, 4883 (2012)
Google Scholar
Singh, K., Upadhyaya, S.: Outlier detection: applications and techniques. Int. J. Comput. Sci. Issues (IJCSI) 9(1), 307 (2012)
Google Scholar
Abdi, H., Williams, L.J.: Principal component analysis. Wiley interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)
Article Google Scholar
Fernández, A., Garcia, S., Herrera, F., Chawla, N.V.: Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Article MathSciNet Google Scholar
Han, H., Wang, W.Y., Mao, B.-H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, pp. 878–887. Springer (2005)
Last, F., Douzas, G., Bacao, F. Oversampling for imbalanced learning based on k-means and smote. arxiv 2017. arXiv preprint arXiv:1711.00837, 2
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradig. 3(1), 4–21 (2011)
Article Google Scholar
Li, M., Ziheng, W., Wang, W., Kun, L., Zhang, J., Zhou, Y., Chen, Z., Li, D., Zheng, S., Chen, P., et al.: Protein-protein interaction sites prediction based on an under-sampling strategy and random forest algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 19(6), 3646–3654 (2021)
Google Scholar
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(2), 121–144 (2009)
Article Google Scholar
Ravuri, S., Vinyals, O. Classification accuracy score for conditional generative models. In: Advances in Neural Information Processing Systems, 32 (2019)
Goutte, C., Gaussier, E. A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: European Conference on Information Retrieval, pp. 345–359. Springer (2005)
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the roc curve. Mach. Learn. 77(1), 103–123 (2009)
Article Google Scholar
Susmaga, R. Confusion matrix visualization. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM ‘04 Conference held in Zakopane, Poland, May 17–20, 2004, pp. 107–116. Springer (2004)
Andelić, N., Šegota, S.B., Lorencin, I., Glučina, M.: Detection of malicious websites using symbolic classifier. Future Internet 14(12), 358 (2022)
Article Google Scholar

Download references

Acknowledgements

This research was (partly) supported by the CEEPUS network CIII-HR-0108, the European Regional Development Fund under Grant KK.01.1.1.01.0009 (DATACROSS), the Erasmus+ project WICT under Grant 2021-1-HR01-KA220-HED-000031177, and the University of Rijeka Scientific Grants uniri-mladi-technic-22-61 and uniri-tehnic-18-275-1447.

Author information

Authors and Affiliations

Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000, Rijeka, Croatia
Nikola Andelić, Sandi Baressi S̆egota & Zlatan Car

Authors

Nikola Andelić
View author publications
You can also search for this author in PubMed Google Scholar
Sandi Baressi S̆egota
View author publications
You can also search for this author in PubMed Google Scholar
Zlatan Car
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikola Andelić.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Appendix A.1. The modified mathematical functions used in GPSC

In description of GPSC and RHVS method, it was mentioned that mathematical functions such as division, square root, natural logarithm, and logarithms with bases 2 and 10 had to be modified to avoid generating infinity or not a number values. The mathematical function of division can be written in the following form:

$$\begin{aligned} y_{\text {DIV}}(x)= & {} {\left\{ \begin{array}{ll}x_1/x_2 &{} |x_2| > 0.001\\ 1 &{} |x_2| < 0.001 \end{array}\right. } \end{aligned}$$

(8)

$$\begin{aligned} y_{\text {SQRT}}{x}= & {} {\left\{ \begin{array}{ll} \sqrt{|x|} &{} |x| > 0.001 \\ \end{array}\right. } \end{aligned}$$

(9)

The natural logarithm, logarithm with bases 2 and 10 can be defined as:

$$\begin{aligned} y_{i}(x) = {\left\{ \begin{array}{ll}\log _i |x| &{} |x| >0.001 \\ 0 &{} |x| < 0.001\end{array}\right. }, i = e, 2, 10 \end{aligned}$$

(10)

1.2 Appendix A.2. How to obtain and use the SEs from this research

Due to a large number of obtained SEs in this paper, the SEs are not shown. The SEs can be obtained from GitHub repository (web-link: https://github.com/nandelic2022/PasswordStrengthEquations.git). After downloading the SEs the procedure of using these consist of following steps:

1.
From initial dataset define input variables and output variable.
2.
use the input variables to calculate the output of each SEs.
3.
use the output generated from SEs to calculate the sigmoid function value, i.e., to determine whether the dataset sample belongs to class or not.
4.
use the previously mentioned evaluation metrics to calculate the SEs performance.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Andelić, N., Baressi S̆egota, S. & Car, Z. Robust password security: a genetic programming approach with imbalanced dataset handling. Int. J. Inf. Secur. (2024). https://doi.org/10.1007/s10207-024-00814-2

Download citation

Published: 07 February 2024
DOI: https://doi.org/10.1007/s10207-024-00814-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust password security: a genetic programming approach with imbalanced dataset handling

Abstract

Access this article

Similar content being viewed by others

PassMon: A Technique for Password Generation and Strength Estimation

One-Class Classification of Low Volume DoS Attacks with Genetic Programming

The Revenge of Password Crackers: Automated Training of Password Cracking Tools

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Appendix A.1. The modified mathematical functions used in GPSC

1.2 Appendix A.2. How to obtain and use the SEs from this research

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust password security: a genetic programming approach with imbalanced dataset handling

Abstract

Access this article

Similar content being viewed by others

PassMon: A Technique for Password Generation and Strength Estimation

One-Class Classification of Low Volume DoS Attacks with Genetic Programming

The Revenge of Password Crackers: Automated Training of Password Cracking Tools

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix A.1. The modified mathematical functions used in GPSC

1.2 Appendix A.2. How to obtain and use the SEs from this research

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation