Elsevier

Computer Communications

Volume 162, 1 October 2020, Pages 160-168
Computer Communications

A practical design of hash functions for IPv6 using multi-objective genetic programming

https://doi.org/10.1016/j.comcom.2020.08.013Get rights and content

Abstract

Hash functions are widely used in high-speed network traffic measurement. A hash function of high quality is supposed to meet the requirements of collision free and fast execution. Existing works have already developed methods to generate hash functions for IPv4 data, while IPv6 data with much longer addresses and different data characteristics may decline the effectiveness of those methods. In this paper, we present a practical design of hash functions for IPv6 measurement, based on the entropy analysis of IPv6 network data and an automated method of multi-objective genetic programming (GP). Considering our specific application of hash functions, we use three fitness functions as the optimization objectives, including active flow estimation, uniformity and seed avalanche effect, among which the active flow estimation is the main objective as the specific measurement task. In implementation of multi-objective GP, we adopted a strategy to limit the hash functions to shorter execution time than other hash functions by advanced experimental investigation. Experiments were conducted to construct hash functions for WIDE IPv6 network data. The results show that our generated hash functions have high usability on different evaluation criteria. It indicates that our generated hash functions are superior in active flow estimation and execution time and could compete with state of art hash functions in terms of uniformity and generating independent hash values for data structures like Bloom Filter.

Introduction

With the rapid growth of network bandwidth, the massive traffic on high-speed backbone poses significant challenges to network measurement. Data stream technology is extracting knowledge from massive real-time data coming at a high data rate by one-pass processing [1]. It is usually implemented with some data structures for different measurement tasks, like bloom filters [2], [3] and sketches [4], [5]. One fundamental component of these data structures is the hash function, and the processing performance of network devices is affected by the hash function used in some modules. Nowadays, the Internet backbones are running at 10 Gbps and 40 Gbps speed and moving to 100 Gbps. The next-generation switches have the capacity of supporting more than one port of 40 Gbps and 100 Gbps, with a switching capacity of 1 to 10 Tbps level. Therefore, it is desired to design an efficient hash function for network measurement to ensure that these technologies remain high performance in high-speed networks.

There have been new hash functions coming up successively, like FNV [6], MurmurHash [7] and CityHash [8]. These hash functions were almost designed manually by experts to guarantee good performance on different types of input. Admittedly, it is a tough and time-consuming task to design a hash function with high quality, due to the obscure and intricate relationships between parameters. That leads to the use of Genetic Programming (GP), which is effective and efficient in feature selection, and was also recently used in many fields [9], [10]. There has already been some research that applies GP to construct hash functions automatically, either for a general purpose or specific applications. Most hash functions are designed to operate on arbitrarily sized inputs. Nevertheless, it is still possible to customize hash functions for some special applications, the performance of which could benefit from fixed-length input or special data distribution. Given that, existing works design hash functions for IPv4 tuples of fixed length [11], [12] using Linear GP, which outperformed general-purpose hash functions especially in execution speed. While for IPv6 data, its 128-bit length will result in longer hash inputs, which is more likely to multiply the execution time of hash functions. Also for the larger address space, the randomness of some bit segments of IPv6 addresses is usually much lower than those of IPv4 addresses in a specific network environment. However, this data distribution might not be beneficial for hash function design, or demand a higher complexity for hash functions. Therefore, it is more worth leveraging the IPv6 data distribution to construct hash functions with better performance and less increase in execution time, rather than IPv4 data with a relatively uniform distribution. Moreover, even though some work applied GP in constructing hash functions for network applications by training on network data, the fitness functions used were not much different from those of general-purpose methods.

For our purpose to design hash functions for IPv6 network data, we proposed a practical design scheme of hash function specifically for five-tuple identifiers of IPv6 flows. By our observation and analysis of the characteristics of IPv6 data in different backbone networks, we adopt a recompose step in dealing with the IPv6 input, based on entropy analysis. Then we use multi-objective genetic programming based on NSGA-II [13] to evolve the function body of the hash function. By applying three fitness functions, the evolved hash functions are expected to be in accordance with both general requirements and application-oriented applications, as well as the availability of generating multiple hash values for data structures used in network measurement. The main contributions of this paper are as follows.

  • We developed a design scheme of hash functions with input processing based on advanced learning of entropy distribution in specific network environments. This design was motivated by our observation of one-week IPv6 network data from WIDE [14] and Campus backbone.

  • We conducted the evolution of hash function by using multi-objective genetic programming. For our specific measurement task, we additionally use active flow estimation as one fitness function. To generate hash function with fast execution, we implemented the evolution with an execution-time control strategy.

  • We provided a comprehensive evaluation of hash functions for IPv6 dataset. By conducting evaluation on IPv6 network data, our evolved hash functions were superior to other state of art hash functions for active flow estimation and execution speed, and were also comparable in some other aspects.

The rest of the paper is organized as follows. Section 2 provides related work on GP-based hash function evolution. Section 3 proposes our design scheme for IPv6 Hash Functions. Section 4 introduces the multi-objective GP implementation for generating hash functions. Section 5 describes the experiments and the evaluation results by comparison. Section 6 conclude this paper with future works.

Section snippets

Related work

Estebanez et al. [15], [16] firstly introduced tree-based Genetic Programming to evolve hash functions. The hash functions were guaranteed collision free by using avalanche effect as the fitness function. The execution speed was assured by using fast operators and limiting the number of nodes. As a supplement and improvement to their previous work, Estebanez et al. [17] used collision rate, uniformity and avalanche effect respectively as the fitness function to guide the evolution of hash

Design framework for IPv6 hash functions

We firstly introduce our design framework for IPv6 Hash Functions based on our preliminary analysis of IPv6 network data in backbone networks.

Hash function evolution based on multi-objective GP

In this section, we describe the process of evolving function using multi-objective GP, including the basic components of GP and our optimization objectives for hash functions. Genetic Programming is the approach to find a solution for a Genetic Algorithm (GA) [27] problem, which automatically searches for high adaptive individuals in the solution space of computer programs [28]. GP evolves individuals for specific tasks using genetic operators including selection, crossover and mutation. The

Experimental results

This section gives the experimental results of hash functions for IPv6 data generated by our proposal, as well as the comparison results with state of art hash functions from various aspects.

Conclusion

In this paper, we proposed a design scheme of hash functions for IPv6 network data. We process the IPv6 hash input based on an entropy analysis of IPv6 data from WIDE [14] and Campus backbone. We apply a multi-objective GP method to evolve hash functions meeting the requirements for measurement. The whole design of hash function evolution is considered for both the general requirements of hash function and specific application purpose. Three optimization objectives (active flow estimation,

CRediT authorship contribution statement

Ying Hu: Conceptualization, Methodology, Software, Validation, Investigation, Data curation, Writing - original draft, Visualization. Guang Cheng: Resources, Visualization, Writing - review & editing, Supervision, Project administration, Funding acquisition. Yongning Tang: Writing - review & editing. Feng Wang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by National Key Research and Development Program of China under Grant No. 2018YFB1800602, Ministry of Education-China Mobile Research Fund Project (MCM20180506), and CERNET Innovation Project, China (NGIICS20190101, NGII20170406).

References (37)

  • DoerrB. et al.

    Evolving boolean functions with conjunctions and disjunctions via genetic programming

  • GrocholD. et al.

    Evolutionary design of fast high-quality hash functions for network applications

  • GrocholD. et al.

    Multi-objective evolution of hash functions for high speed networks

  • DebK. et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Trans. Evol. Comput.

    (2002)
  • Wide project

    (2019)
  • EstébanezC. et al.

    Evolving hash functions by means of genetic programming

  • EstébanezC. et al.

    Finding state-of-the-art non-cryptographic hashes with genetic programming

  • EstébanezC. et al.

    Automatic design of noncryptographic hash functions using genetic programming

    Comput. Intell.

    (2014)
  • Cited by (0)

    View full text