Numerical sensitive data recognition based on hybrid gene expression programming for active distribution networks
Introduction
As an important part of information security in active distribution networks, data security has received increasing attention from researchers in recent years [1], [2]. However, data security in active distribution networks is more complicated than traditional distribution networks due to complex network architecture and frequent data interaction.
With the extensive application of advanced information and communication technologies such as wireless communication and Internet of Things in active distribution networks, active distribution network must face an increasingly serious threat of viruses, Trojan horses and hacker attacks from the Internet [3], [4], [5], [6]. Especially with the continuous construction of a strong smart grid, the active distribution network has a more complex access environment, flexible and diverse access methods, a large number of intelligent access terminals and dynamically distributed massive access data. Notably, with the access of a large number of distributed loads in the active distribution network, the interaction between the power grid and the users is greatly enhanced, which makes it possible for the users’ electricity data and monitoring data of various equipment state operations to be stolen, modified and injected with bad data during network communications. Therefore, how to ensure the confidentiality, integrity and non-repudiation of business data in active distribution network has become a research hotspot for data security protection in active distribution network.
Traditional data security protection methods, such as data encryption [7] and access control [8], have high requirements on the computing power, storage capacity, and network transmission bandwidth of the user and server in active distribution network. The most important thing is that the business data transmitted in the active distribution network cannot be identified at the content level. Therefore, these data security protection methods are passive protection. To better solve data security protection, many researchers proposed data intelligent filtering based on content recognition [9], [10] which can better achieve content level security protection. Compared with passive protection measures, it belongs to proactive protection. The existing data content filtering algorithms, which are based on text classification, focus on text and cannot effectively solve the leakage of numerical data in a SCADA system, an AMI system or smart meters in active distribution networks.
Gene expression programming (GEP) which is first proposed by Candida [11] is a new evolution algorithm. GEP has powerful classification and function mining capabilities [12], [13], which can solve the problem of sensitive data identification well. Therefore, in this paper, we propose a novel parallel numerical data recognition algorithm for active distribution network based on feature selection and improved gene expression programming, which combine the advantages of rough sets and gene expression programming, to better protect the security of numerical data transmission in active distribution networks. The major contributions of our work are listed as follows:
- •
To reduce the complexity of sensitive data recognition model, this paper proposes a rough feature selection algorithm based on average importance measurement (RFS-AIM). The purpose is to quantitatively analyze the importance of each feature after reduction to the final decision feature.
- •
On the basis of RFS-AIM, in this paper, we propose a sensitive data recognition function mining algorithm based on feature selection and improved gene expression programming (SDR-IGEP). By using the concept of chromosome similarity, the algorithm improves the genetic operation in the traditional GEP and prevents the GEP population from falling into a local optimum.
- •
Meanwhile, an incremental mining algorithm of sensitive data recognition function based on global function fitting (ISDR-GFF) is proposed to solve the increasing sensitive data recognition function mining in active distribution networks. This algorithm constructs the architecture of parallel function mining based on grain granulation. At the same time, the multi-population grafting operation based on population similarity is proposed to improve the population diversity on the computing nodes and to increase the convergence speed of the GEP population.
- •
Experimental results on IEEE benchmark datasets and real datasets show that the proposed algorithms in this paper outperformed the traditional other algorithms in terms of the precision, recall, index, accuracy and specificity of sensitive data recognition, the average running time and speedup.
The remainder of this paper is organized as follows. Section 2 introduces a detailed overview of the related work. Section 3 focuses on the rough feature selection algorithm based on average importance measurement. Section 4 proposes a sensitive data recognition function mining algorithm based on feature selection and improved gene expression programming. Section 5 designs an incremental mining algorithm of sensitive data recognition function based on global function fitting. To evaluate the performance of the proposed algorithm, experimental results and analyses on IEEE benchmark datasets and real load datasets are given in Section 6. Conclusions are remarked in the last section.
Section snippets
Data security of smart grid
The access of various distributed energy sources and flexible loads makes the interaction of various intelligent terminals and users in smart grid more and more frequent and the distribution of data leakage points more and more extensive. Due to the insecurity of smart grid, the data in the cyber and physical system of the distribution network must address numerous security attacks and threats. Attia et al. [14] presented an intrusion detection system architecture for detecting illegal attacks
Rough feature selection algorithm based on average importance measurement
An active distribution network is a classical cyber physical system. The source and interaction between the information flow and power flow in an active distribution network are shown in Fig. 1 [32]. Fig. 1 shows that the data sources in the active distribution network come from a wide range, mainly from power distribution SCADA systems or DMS systems, advanced metering facilities AMI, smart meters and so on. There are many types of data, including the status data of distribution lines, alarm
Sensitive data recognition function mining algorithm based on feature selection and improved gene expression programming
Data from distribution SCADA systems, various status monitoring systems, AMIs, smart meters, and operational logs are the core assets of active distribution networks. These data come from various aspects such as power monitoring, equipment operation, and power consumption. In general, they are characterized by having a wide range of sources, having a large scale, and being composed of complex types. However, with the widespread use of various types of wireless communication technologies in
Incremental mining algorithm of the sensitive data recognition function based on global function fitting
The operation process of the active distribution network is not invariable. Because of the unstable output of the distributed generation and the strong load fluctuation, all types of electrical information and topological data in the operation process of the active distribution network change dynamically. Meanwhile, the active distribution network has a wide range of data sources and a wide geographical distribution. There are many system parameters that affect the safe and stable operation of
Experimental environment
To better explain the effectiveness and feasibility of the proposed algorithms, the related experiments are performed in a laboratory environment. The hardware in the experiments includes five algorithm servers and one administration server. The hardware and software configurations are shown in Table 1, Table 2, respectively.
In Table 1, the algorithm server is used mainly to execute the sensitive data recognition function mining algorithm, and the management server is used mainly to manage the
Conclusions
The safe and efficient transmission of data is critical to the safe and stable operation of active distribution network business systems. The existing data transmission security protection in an active distribution network focuses on unstructured and structured data such as text and database, and is mainly based on centralized text classification algorithms and traditional security defense methods such as access control and encryption. Therefore, to solve the intelligent recognition of
CRediT authorship contribution statement
Song Deng: Conceptualization, Methodology, Software, Writing - original draft, Formal analysis, Funding acquisition. Xiangpeng Xie: Data curation, Writing - review & editing. Changan Yuan: Resources, Investigation. Lechan Yang: Investigation. Xindong Wu: Project administration, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We would like to thank the anonymous reviewers for their comments and constructive suggestions that have improved the paper. The subject is sponsored by the National Natural Science Foundation of PR China (No.51977113,51507084) and Science Foundation of Nanjing University of Posts and Telecommunication (NUPTSF), China (No.NY219095).
References (40)
Smart grid cyber security for europe
Energy Policy
(2011)- et al.
Fuzzy c-means clustering based on weights and gene expression programming
Pattern Recognit. Lett.
(2017) - et al.
An efficient intrusion detection system against cyber-physical attacks in the smart grid
Comput. Electr. Eng.
(2018) - et al.
Distributed filtering under false data injection attacks
Automatica
(2019) - et al.
Identification of vulnerable node clusters against false data injection attack in an ami based smart grid
Inf. Syst.
(2015) - et al.
Improved global-best particle swarm optimization algorithm with mixed-attribute data classification capability
Appl. Soft Comput.
(2014) - et al.
A fuzzy conceptualization model for text mining with application in opinion polarity classification
Knowl.-Based Syst.
(2013) - et al.
Para: A positive-region based attribute reduction accelerator
Inform. Sci.
(2019) - et al.
Lightweight quantum encryption for secure transmission of power data in smart grid
IEEE Access
(2019) - et al.
Big data issues in smart grids: A survey
IEEE Syst. J.
(2019)