A practical design of hash functions for IPv6 using multi-objective genetic programming
Introduction
With the rapid growth of network bandwidth, the massive traffic on high-speed backbone poses significant challenges to network measurement. Data stream technology is extracting knowledge from massive real-time data coming at a high data rate by one-pass processing [1]. It is usually implemented with some data structures for different measurement tasks, like bloom filters [2], [3] and sketches [4], [5]. One fundamental component of these data structures is the hash function, and the processing performance of network devices is affected by the hash function used in some modules. Nowadays, the Internet backbones are running at 10 Gbps and 40 Gbps speed and moving to 100 Gbps. The next-generation switches have the capacity of supporting more than one port of 40 Gbps and 100 Gbps, with a switching capacity of 1 to 10 Tbps level. Therefore, it is desired to design an efficient hash function for network measurement to ensure that these technologies remain high performance in high-speed networks.
There have been new hash functions coming up successively, like FNV [6], MurmurHash [7] and CityHash [8]. These hash functions were almost designed manually by experts to guarantee good performance on different types of input. Admittedly, it is a tough and time-consuming task to design a hash function with high quality, due to the obscure and intricate relationships between parameters. That leads to the use of Genetic Programming (GP), which is effective and efficient in feature selection, and was also recently used in many fields [9], [10]. There has already been some research that applies GP to construct hash functions automatically, either for a general purpose or specific applications. Most hash functions are designed to operate on arbitrarily sized inputs. Nevertheless, it is still possible to customize hash functions for some special applications, the performance of which could benefit from fixed-length input or special data distribution. Given that, existing works design hash functions for IPv4 tuples of fixed length [11], [12] using Linear GP, which outperformed general-purpose hash functions especially in execution speed. While for IPv6 data, its 128-bit length will result in longer hash inputs, which is more likely to multiply the execution time of hash functions. Also for the larger address space, the randomness of some bit segments of IPv6 addresses is usually much lower than those of IPv4 addresses in a specific network environment. However, this data distribution might not be beneficial for hash function design, or demand a higher complexity for hash functions. Therefore, it is more worth leveraging the IPv6 data distribution to construct hash functions with better performance and less increase in execution time, rather than IPv4 data with a relatively uniform distribution. Moreover, even though some work applied GP in constructing hash functions for network applications by training on network data, the fitness functions used were not much different from those of general-purpose methods.
For our purpose to design hash functions for IPv6 network data, we proposed a practical design scheme of hash function specifically for five-tuple identifiers of IPv6 flows. By our observation and analysis of the characteristics of IPv6 data in different backbone networks, we adopt a recompose step in dealing with the IPv6 input, based on entropy analysis. Then we use multi-objective genetic programming based on NSGA-II [13] to evolve the function body of the hash function. By applying three fitness functions, the evolved hash functions are expected to be in accordance with both general requirements and application-oriented applications, as well as the availability of generating multiple hash values for data structures used in network measurement. The main contributions of this paper are as follows.
- •
We developed a design scheme of hash functions with input processing based on advanced learning of entropy distribution in specific network environments. This design was motivated by our observation of one-week IPv6 network data from WIDE [14] and Campus backbone.
- •
We conducted the evolution of hash function by using multi-objective genetic programming. For our specific measurement task, we additionally use active flow estimation as one fitness function. To generate hash function with fast execution, we implemented the evolution with an execution-time control strategy.
- •
We provided a comprehensive evaluation of hash functions for IPv6 dataset. By conducting evaluation on IPv6 network data, our evolved hash functions were superior to other state of art hash functions for active flow estimation and execution speed, and were also comparable in some other aspects.
The rest of the paper is organized as follows. Section 2 provides related work on GP-based hash function evolution. Section 3 proposes our design scheme for IPv6 Hash Functions. Section 4 introduces the multi-objective GP implementation for generating hash functions. Section 5 describes the experiments and the evaluation results by comparison. Section 6 conclude this paper with future works.
Section snippets
Related work
Estebanez et al. [15], [16] firstly introduced tree-based Genetic Programming to evolve hash functions. The hash functions were guaranteed collision free by using avalanche effect as the fitness function. The execution speed was assured by using fast operators and limiting the number of nodes. As a supplement and improvement to their previous work, Estebanez et al. [17] used collision rate, uniformity and avalanche effect respectively as the fitness function to guide the evolution of hash
Design framework for IPv6 hash functions
We firstly introduce our design framework for IPv6 Hash Functions based on our preliminary analysis of IPv6 network data in backbone networks.
Hash function evolution based on multi-objective GP
In this section, we describe the process of evolving function using multi-objective GP, including the basic components of GP and our optimization objectives for hash functions. Genetic Programming is the approach to find a solution for a Genetic Algorithm (GA) [27] problem, which automatically searches for high adaptive individuals in the solution space of computer programs [28]. GP evolves individuals for specific tasks using genetic operators including selection, crossover and mutation. The
Experimental results
This section gives the experimental results of hash functions for IPv6 data generated by our proposal, as well as the comparison results with state of art hash functions from various aspects.
Conclusion
In this paper, we proposed a design scheme of hash functions for IPv6 network data. We process the IPv6 hash input based on an entropy analysis of IPv6 data from WIDE [14] and Campus backbone. We apply a multi-objective GP method to evolve hash functions meeting the requirements for measurement. The whole design of hash function evolution is considered for both the general requirements of hash function and specific application purpose. Three optimization objectives (active flow estimation,
CRediT authorship contribution statement
Ying Hu: Conceptualization, Methodology, Software, Validation, Investigation, Data curation, Writing - original draft, Visualization. Guang Cheng: Resources, Visualization, Writing - review & editing, Supervision, Project administration, Funding acquisition. Yongning Tang: Writing - review & editing. Feng Wang: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by National Key Research and Development Program of China under Grant No. 2018YFB1800602, Ministry of Education-China Mobile Research Fund Project (MCM20180506), and CERNET Innovation Project, China (NGIICS20190101, NGII20170406).
References (37)
- et al.
An improved data stream summary: the count-min sketch and its applications
J. Algorithms
(2005) - et al.
Genetic programming for energy-efficient and energy-scalable approximate feature computation in embedded inference systems
IEEE Trans. Comput.
(2018) - et al.
Algorithmic improvements for fast concurrent cuckoo hashing
- et al.
Models and issues in data stream systems
- et al.
Network applications of bloom filters: A survey
Internet Math.
(2004) - T. Yang, A.X. Liu, M. Shahzad, Y. Zhong, Q. Fu, Z. Li, G. Xie, X. Li, A Shifting Bloom Filter Framework for Set...
- T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao, X. Li, S. Uhlig, Elastic sketch: adaptive and fast...
Fowler/noll/vo (fnv) hash
(1991)Murmurhash 2010
(2003)- et al.
Introducing cityhash
(2011)