1 Introduction

A technique of genetic programming (GP) [17, 18] is an algorithm to optimize structured data based on a evolutionary algorithm (EA) [11, 25]. GP is applied to various fields such as program synthesis [5], function generations [14] and rule set discoveries [30]. Although GP is very effective for optimizing structured data, it has several problems such as getting into a bloat, inadequate optimization of constant nodes, being easily captured in local optimal solution area when applied to complicated problems. The main cause of the bloat is a crossover operator which exchanges partial trees of parent individuals [2, 3, 7, 27], where this paper focuses on the optimization of tree-structure data by means of GP. Several techniques to reduce the bloat have been proposed by improving the simple crossover operation [6, 10, 13, 18, 19, 26]. Although these methods have successfully inhibited bloat to a certain extent, effective search has not necessarily been performed. Moreover, there is no theoretical basis that crossover is effective for optimizing the tree-structure data.

Apart from reduction of the bloat, a search method for optimizing the graph structure has been proposed [15]. Although this method is suitable for searching various size of the structured data consisting of nodes and branches, the algorithm is complicated and the computation cost is very high. In this paper, we exclude the crossover operator which is the cause of the bloat in GP, and propose a partial sampling (PS) operator [29] as a new mating operator. In PS operator, first of all, a partial sample of a partial tree structure is extracted from several individuals of a parent individual group by a procedure of a proliferation. Next, the partial tree structure obtained by the proliferation is combined with a new tree structure by a metastasis. In this paper, two types of metastasis are prepared for GP, one that depends on the original upper node and the other that does not. Repeating the proliferation and the metastasis regenerates a new tree-structure data for the next generation.

Moreover, a multi-objective EA (MOEA) technique for suppressing the bloat problem and acquire many kinds of various tree-structure data is applied for GP by adding two more objective functions. One of the newly added objective functions is the size of the tree-structure data. Furthermore, the relative position of the target individual in the population in terms of the structural distance (SD) is also evaluated as an objective function. Then, the optimization of the tree-structure data is formulated as a multi-objective optimization problem (MOP) based on these three objective functions. NSGA-II [8, 9] is applied to this MOP. In the conventional NSGA-II, a crowding distance (CD) is applied for ranking the front set overflowing from the parent group. Because the conventional NSGA-II sorts such the individuals of the overflowing front set with CD and selects extreme solution, diversity about tree structure is not maintained. In this paper, SD is applied, instead of CD, for ranking the overflowing front set from the parent group.

The proposed technique and the conventional techniques are applied to a double spiral problem [6, 38] for verification. This problem is a classification problem containing two classes of point sets arranged on a spiral shape to be classified with a function. This problem is well known as one of difficult problem to solve with a neural network.

The number of the index of the goodness of the tree-structure data When the index of the goodness of tree-structure data is two or more, the number of objective functions in the above problem becomes four or more. We also propose an effective many-objective EA (MaOEA) applicable to such the many-objective GP. Many-objective optimization problems (MaOPs) are difficult to solve and is tackled by many researchers [4, 8, 9, 39,40,41]. Although SPEA2 [39,40,41] and NSGA-II [8, 9] are well known as powerful algorithm for MOPs, they do not work so effectively for MaOPs [1, 12, 33]. In this paper, we handle the case of solving an MaOP by NSGA-II based algorithm.

When applying NSGA-II or SPEA2 to MaOP, as the objective number increases, most of the solutions in the solution set, or population, become a relation that is not superior or inferior to each other. This relation is called non-dominated (ND) relationship. As a result, the convergence of the obtained set of Pareto Optimal Solutions (\({\mathcal{POS}}\)) to the optimum Pareto front remarkably decreases. Sato et al. [34] have proposed a concept of Pareto partial dominance that makes it easier to determine the superiority/inferiority relationship between solutions by using several objective functions instead of all objective functions as an algorithm for such MaOP. Since NSGA-II based on Pareto partial dominance (NSGA-II-PPD) focuses on a relatively small number of objectives, solutions are easy to decide superiority/inferiority even on MaOP, and an effective selection pressure can be expected.

SPEA2 with a shift-based density estimation (SDE) strategy [20, 21, 23] is also very strong algorithm to solve multi-objective optimization problems. This method requires a lot of computational cost to forcibly rank individual subsets in non-dominant relationships. Also when optimizing the tree structure by SPEA2 technique with SDE, it has been difficult to suppress bloat. Therefore, this research focuses on CD which is advantageous in terms of simplicity and less computational cost. And this paper proposes SD for the purpose of suppressing the bloat.

NSGA-II-PPD has the following three problems. The first problem is that a combination list of the number of objects to be used for Pareto partial dominance must be specified before the optimization. The second one is that an appropriate number of selected objectives according to the complexity of the problem in undecided. Moreover, the contents of the combination list greatly influence the optimization result. NSGA-II-PPD performs ND sorting using all objective functions at a specific generation cycle, and preserves parents as an archive set for the next generation. This process generates child individuals having the same contents as the already existing individual in the archive set in some cases. As a result, the same individuals increases in the first front set, which disturbs effective ranking in the front selection. This is the third problem. By consideration of these problems, this paper proposes a simple scheduling technique of partial objective set used for Pareto partial dominance and a technique of killing individuals having the same contents in preserving the archive set [28]. In order to verify the effectiveness of the proposed techniques, we examine a many-objective 0/1 knapsack problem [41].

2 Partial sampling operator for mating

One of the main causes of the bloat is the crossover operator generally applied in the conventional GPs, used for regenerating a new tree-structure data. This paper proposes to exclude the crossover operator from the conventional GP and to apply PS operator for regeneration of a new tree-structure data instead of the crossover operator. The PS operator creates a new tree-structure data by partially sampling tree structures from a parent individual and joining them together. This procedure is called a proliferation. The proliferation is terminated according to the probability, \(p_{\mathrm{t}}\). Partially sampled subtree structures by the proliferation are combined together by a metastasis. Two types of the metastasis are prepared, one that depends on the original upper node and the other that does not. We call the the former as an upper node depend metastasis and the latter as a random metastasis.

In the initial proliferation, a root node, \(n_{i,{\mathrm{root}}}\), of an individual, \({{\mathrm{indiv}}}_i\), randomly selected from a parent group \({\mathbf{P}}^{g}\) is copied to a set \({\mathbf{T}}_{\mathrm{sub}}\) as shown in Fig. 1. The initial proliferation is started from the root node, \(n_{i,{\mathrm{root}}}\), of the individual, \({{\mathrm{indiv}}}_i\). In this example, the starting root node contains an identification, A. Let \({\mathbf{N}}_{\mathrm{cand}}\) be a set of all lower nodes under the node of \({\mathbf{T}}_{\mathrm{sub}}\), where that node is not selected as a node of \({\mathbf{T}}_{\mathrm{sub}}\) yet. One node is randomly selected from \({\mathbf{N}}_{\mathrm{cand}}\) and copied to \({\mathbf{T}}_{\mathrm{sub}}\). The proliferation terminates according to the proliferation terminate probability, \(p_{\mathrm{t}}\), or when \({\mathbf{N}}_{\mathrm{cand}} = \emptyset\). When the proliferation is terminated, the set \({\mathbf{T}}_{\mathrm{sub}}\) thus obtained is copied to \({\mathbf{T}}_{\mathrm{new}}\), where is a set of nodes as a new tree-structure data. The set \({\mathbf{T}}_{\mathrm{sub}}\) is initialized to \(\emptyset\). Furthermore, the root node of the partial tree structure \({\mathbf{T}}_{\mathrm{new}}\) in the initial proliferation is randomly generated in a low probability on the initial proliferation.

Fig. 1
figure 1

The initial proliferation in PS operator

In the conventional GP with variable structure length, small partial structures are assembled by an regenerating operator, for example, crossover or mutation, and these partial structures are combined to generate a new tree-structure data of a large size [31, 32]. When the conventional GP increases the average size of the tree-structure data, the size of the partial structure also preserved for the next generation increases. Therefor, scheduling the probability, \(p_{\mathrm{t}}\), as follows prevents the size of the partial tree structure from explodingly increasing.

$$\begin{aligned} p_{{\mathrm{t}}}^{0} = \frac{1}{\mathrm{AverageSize}({\mathbf{R}}^{g})} , \end{aligned}$$
(1)
$$\begin{aligned} p_{{\mathrm{t}}}^{g+1} = \frac{{\mathrm{Succ}}({\mathbf{R}}^{g})-p_{{\mathrm{t}}}^{0} \cdot {\mathrm{Succ}}({\mathbf{P}}^{g})}{{\mathrm{Succ}}({\mathbf{P}}^{g})-p_{{\mathrm{t}}}^{0} \cdot {\mathrm{Succ}}({\mathbf{R}}^{g})} \left( p_{{\mathrm{t}}}^{g}-p_{{\mathrm{t}}}^{0} \right) +p_{{\mathrm{t}}}^{0} , \end{aligned}$$
(2)

where \({\mathbf{R}}^{g}\) denotes the population at the g-th generation, \({\mathbf{P}}^{g} \subset {\mathbf{R}}^{g}\) denotes the parent set at the g-th generation,

\({\mathrm{AverageSize}}({\mathbf{R}}^{g})\) denotes a function returning the average size of each tree structure of the population, and \({\mathrm{Succ}}( \cdot )\) denotes a function returning the average size of the partial tree structure that the argument set takes over from the previous generation.

A partial tree structure is grown by applying one of two kinds of metastasis to the partial tree structure obtained by the initial proliferation. One of two kinds of metastasis is a random metastasis. The random metastasis activates according to a metastasis selection probability, \(p_{{\mathrm{met}}}\). The other one metastasizes depending on the upper node. The upper node depend metastasis activates according to the probability, \(1-p_{{\mathrm{met}}}\). The partial tree structure \({\mathbf{T}}_{\mathrm{new}}\) shown in the Fig. 1 has three empty branched numbered as 1, 2 and 3. The branch 1 has the upper node A, and the branches 2 and 3 have upper node D. Now, suppose that the branch 1 is selected as a target of the upper node depend metastasis. In the next proliferation, a node having the upper node A is selected from the parent group, \({\mathbf{P}}^{g}\). On the other hand, if the random metastasis is applied to the partial tree structure \({\mathbf{T}}_{\mathrm{new}}\), the beginning node for the next proliferation is randomly selected from the parent group, \({\mathbf{P}}^{g}\).

Fig. 2
figure 2

Outline of how a new tree structure is created by PS operator

A new node is selected from the parent group, \({\mathbf{P}}^{g}\), according to the decided metastasis type. This node is not necessarily a root node. The proliferation is started from the selected node again.

By repeating the proliferation and the metastasis, new tree-structure data is generated as shown in Fig. 2. However, when the metastasis applied to only one parent individual, or when a parent individual having the same structure as the generated tree structure, the generated tree structure is eliminated and PS operator is performed again. The terminal nodes are given as a random number in a low probability, where this is based on the conventional mutation idea.

3 Multi-objective GP with structural distance

Optimizing the tree-structure data based only on the index of its own goodness brings problems that causes the bloat but also that the optimization is caught in a local optimum region. Depending on the structure of the local optimum region, the optimization stagnates, causing an illusion as if the obtained solution(s) were ultimate optimal. To avoid the risk of such the problems, this paper, therefor, proposes a technique to optimize the tree-structure data based on the size of the tree structure and SD in the population in addition to the index of the goodness of tree structure.

In this paper, three objective functions, \(h_1\), \(h_2\) and \(h_3\), are defined as follows to be used for the multi-objective optimization . An objective function according to the goodness of an individual, \({\mathrm{indiv}}_i\), is described by the following equation.

$$\begin{aligned} h_1 ({\mathrm{indiv}}_i ) = {{\mathrm{performance}}}({\mathrm{root}}_i) , \end{aligned}$$
(3)

where \({\mathrm{root}}_i\) denotes a root node of the individual, \({\mathrm{indiv}}_i\), and performance(rooti) denotes a function that returns value of the goodness of the tree structure beginning from the root node, rooti.

An objective function according to the size of tree structure is defined by the following equation.

$$\begin{aligned} h_2 ({\mathrm{indiv}}_i ) = \frac{1}{{\mathrm{Size}}({\mathrm{root}}_i)}, \end{aligned}$$
(4)

where Size(rooti) denotes a function that returns the number of the nodes of the tree structure beginning from the root node, rooti.

An objective function according to average of SD in the population is defined by the following equation.

$$\begin{aligned} h_3 ({\mathrm{indiv}}_i ) = \frac{1}{N_{\mathrm{pop}} } \sum _{k=1}^{N_{\mathrm{pop}} } {\mathrm{SD}}({\mathrm{root}}_i, {\mathrm{root}}_k ) , \end{aligned}$$
(5)

where \(N_{\mathrm{pop}}\) denotes the size of the population, and

\({\mathrm{SD}}({\mathrm{root}}_i , {\mathrm{root}}_k )\) denotes a function that returns SD between \({\mathrm{indiv}}_i\) and \({\mathrm{indiv}}_k\). In order to calculate SD, weights are given to all the nodes of the tree structured data by means of the following steps, when the tree structured data is initially generated. An example of giving weights to the tree structure is shown in Fig. 3.

Fig. 3
figure 3

An example of giving weights to the tree structure and an example of computation of SD

  • (Step 1) Give weight 1 to the root node.

  • (Step 2) Assume that W is a weight given to the current node.

  • (Step 3) Equally distribute weights to the lower nodes of the current node so that the total is W/2.

Two tree structures are compared in order from the root node to check conformity of both nodes as shown in Fig. 3. The distance, \({\mathrm{SD}}({\mathrm{root}}_i , {\mathrm{root}}_k )\), is initialized as zero. When different nodes are found in the conformity comparison, the weight of that node is added to the distance. The lower nodes below the different node are all ignored. Especially, when the tree structures of both are completely different, \({\mathrm{Distance}}({\mathrm{root}}_i , {\mathrm{root}}_k )\) is given 1 as the maximum value.

Fig. 4
figure 4

Conventional NSGA-II with CD

Now, we have defined the three-objective optimization problem. NSGA-II shown in Fig. 4 is applied to solve this problem. When several objective functions mean goodness of the tree structure and should not be joined together, the number of them and the two objective functions indicated above, \(h_2\) and \(h_3\), are the total number of objective functions in the proposed method in this paper. NSGA-II selects parent individuals by using non-dominated sorting and ranking with CD. Since tree-structure data is to be optimized in this paper, CD based only on the value of the objective function does not necessarily bring the diversity of the tree structure. Therefore, this paper propose to use SD when selecting parents from the rank set overflowing from the parent group. A block chart of the modified NSGA-II with SD is shown in Fig. 5.

Fig. 5
figure 5

Modified NSGA-II with SD

4 Many-objective evolutionary algorithm for MaOGP

MOP is a problem that optimizes, or maximizes in this paper, multiple objective functions under several constraints. Since the objective functions are in a trade-off relationship with each other, it is not possible, in general, to obtain the only one solution that completely satisfies all the objective functions. Therefore, we require to obtain \({\mathcal{POS}}\) of compromised solutions without superiority or inferiority to each other. For the objective function vector \({\mathbf{f}}\) consisting m objective functions, \(f_i\), the problem of finding the variable vector \({\mathbf{x}}\) that maximizes the value of \(f_i\) in the feasible region S in the solution space is defined as follows.

$$\begin{aligned} \left\{ \begin{array}{ll} {\mathrm{max.}} &\quad {\mathbf{f}}({\mathbf{x}}) = \left[ f_{1}({\mathbf{x}}), f_{2}({\mathbf{x}}), \ldots , f_{m}({\mathbf{x}}) \right] ^T \\ {\mathrm{s.t.}} &\quad \; {\mathbf{x}} \in {\mathbf{S}} \end{array}\right. \end{aligned}$$
(6)

When the values of the objective function, \(f_i\), of two solutions x and y satisfy the following relation, we say that the solution x dominates the solution y.

$$\begin{aligned}&{\mathbf{f}}({\mathbf{x}}) \succ {\mathbf{f}}({\mathbf{y}}) \\&\quad \triangleq \forall i \in {\mathbf{M}} : f_{i}({\mathbf{x}}) \geqq f_{i}({\mathbf{y}}) \wedge \exists i \in {\mathbf{M}} : f_{i}({\mathbf{y}}) > f_{i}({\mathbf{y}}) \end{aligned}$$
(7)

where \({\mathbf{M}}\) denotes a set of the indexes for the objective function, \(\{ 1,2, \ldots , m\}\). When there is no solution dominates a solution \({\mathbf{x}}\), the solution \({\mathbf{x}}\) is called non-inferior solution. A set of such the non-inferior solutions is defined as the following \({\mathcal{POS}}\).

$$\begin{aligned} {\mathcal{POS}} = \{ {\mathbf{x}} \in {\mathbf{S}} | \lnot \exists {\mathbf{y}} \in {\mathbf{S}}. {\mathbf{f}}({\mathbf{y}}) \succ {\mathbf{f}}({\mathbf{x}}) \} \end{aligned}$$
(8)

A Pareto front showing the the trade-off relation between the objective functions is defined as follows.

$$\begin{aligned} {\mathcal{FRONT}} = \{ {\mathbf{f}}({\mathbf{x}}) | {\mathbf{x}} \in {\mathcal{POS}} \} \end{aligned}$$
(9)

Several effective studies [4, 8, 9, 39,40,41] have been made on MOP as defined by Eq.(6). NSGA-II shown in Fig. 4 is a powerful multi-objective optimization scheme as a method proposed on one of these studies. NSGA-II applies non-dominated sorting (ND sorting) to the population \({\mathbf{Q}}\), and the individuals are classified to several ranked subsets, \({\mathbf{F}}_1,\) \({\mathbf{F}}_2,\) \({\mathbf{F}}_3,\) \(\ldots\). While not exceeding the size of the parent set \({\mathbf{P}}\), the individuals of each subset are moved to the parent set in order. Individuals of the subset that exceeds the size of the parent set is sorted using crowding distance (CD sorting) and moved to the parent set. The individuals not selected are culled. The mating operators generates the child set \({\mathbf{C}}\) from the parent set \({\mathbf{P}}\) by using the crossover and mutation operators.

Although NSGA-II effectively solves MOP with less than four objective functions, as the objective number m increases, an appropriate \({\mathcal{POS}}\) could not be obtained even by those methods containing the conventional NSGA-II. When ND is performed based on the conventional Pareto dominance using all m objective functions, as the number of objective function increases, a subset of solutions satisfying Eq.(7) is difficult to obtain [37]. Then most solutions of the population become non-inferior solutions. As a result, the superiority/inferiority relationship between solutions is difficult to determined, and the selection pressure in the optimization is significantly reduced. This paper focuses to NSGA-II with Pareto partial dominance shown in Fig. 6 for solving MaOP. Pareto partial dominance is based on a concept of partially applying Pareto domination to r objective functions extracted from all m objective functions. The Pareto partial dominance is defined by the following formula.

$$\begin{aligned}&{\mathbf{f}}({\mathbf{x}}) \sqsupset {\mathbf{f}}({\mathbf{y}}) \\&\quad \triangleq \forall i \in {\mathbf{R}} \subset {\mathbf{M}} : f_{i}({\mathbf{x}}) \\&\quad \geqq f_{i}({\mathbf{y}}) \wedge \exists i \in {\mathbf{R}} \subset {\mathbf{M}} : f_{i}({\mathbf{y}}) > f_{i}({\mathbf{y}}) \end{aligned}$$
(10)

where \({\mathbf{R}}\) denotes a set of r indexes selected from \({\mathbf{M}}\). Since conditions satisfying Pareto partial dominance are relaxed as compared with the conventional dominance using all m objective functions, the population is easier to rank finely in MaOP with large m.

In NSGA-II-PPD, first of all, given r, the number of objective functions to be considered in the partial ND sorting, a combination list of \(_{m} C _{r}\) selections is prepared beforehand. For each \(I_g\) generations, the combination of the objective functions to be considered for Pareto partial dominance is changed, and \({\mathbf{R}}_{g+1}\) is selected with performing ND sorting on \({\mathbf{P}}_{g} + {\mathbf{C}}_{g} + {\mathbf{A}}\) using all m objective functions and copied to the archive set \({\mathbf{A}}\), where + denotes the direct sum.

Fig. 6
figure 6

NSGA-II with Pareto partial dominance

5 Improvement of NSGA-II based on Pareto partial dominance

NSGA-II-PPD has the following three problems. The first problem is that the subset size of the objective functions to be used for Pareto partial dominance is required to beforehand specify before the optimization in a form of a list, or the combination list. The second one is that an appropriate value of the subset size according to the complexity of the problem is unknown. The contents of the combination list greatly influence the optimization result. On the other hand, the creation of the combination list is a very troublesome and difficult task for the user. NSGA-II-PPD performs ND sorting using all objective functions at a specific generation cycle, and preserves parents as an archive set for the next generation. This process generates child individuals having the same contents as the already existing individual in the archive set in some cases. As a result, the same individuals increases in the first front set, which disturbs effective ranking in the front selection. This is the third problem. In order to avoid these problems, this paper proposes two improvements. A block chart of the improved NSGA-II-PPD is shown in Fig. 7.

Fig. 7
figure 7

Improved NSGA-II with Pareto partial dominance

As the first improvement, a subset size scheduling (SSS) is proposed for NSGA-II-PPD. NSGA-II-PPD treated in this paper does not use the combination list for each \(I_g\) generation cycle. The parameter r is given by the following equations.

$$\begin{aligned} q= \frac{g\cdot m}{G}+ {\text{rand}}\_{\text{int}}(2B+1)-B, \end{aligned}$$
(11)
$$\begin{aligned} r = \left\{ \begin{array}{ll} B, &\quad q< B \\ q, &\quad B \leqq q < m \\ m, &\quad q \geqq m \end{array} \right. \end{aligned}$$
(12)

where m denotes the number of the objective functions, \({\text{rand}}\_{\text{int}}(\cdot )\) denotes a function returns a random integer less than the argument, B denotes an integer parameter larger than 1 and less than m/2, and G denotes the end generation. Figure 8 shows the possible value of the selection number, r.

Fig. 8
figure 8

The selection number, r, probabilistically takes a value on the colored range according to the generation g, where \({\text{rand}}\_{\text{int}}(\cdot )\) denotes a function returns a random integer less than the argument, B denotes an integer parameter larger than 1 and less than m/2, and G denotes the final generation

In NSGA-II-PPD, several individuals having the same contents as an individual already existing in the children, \({\mathbf{C}}_{t}\), or the archive set, \({\mathbf{A}}\), are generated and stored by the mating. If the optimization proceeds while sustaining such the individuals having relatively good evaluation, duplicates of them increases within the population. If the problem to be optimized is relatively simple, individuals with the same content are frequently generated during the optimization. The second improvement is elimination of such the individuals having the same contents of an individual already existing in the children, \({\mathbf{C}}_{g}\), and the archive set, \({\mathbf{A}}\), after the mating. In other words, the duplicates created by the mating is eliminated, we call this elimination of duplicates (EoD). Since the optimization problem treated in this paper is the maximizing problem, by setting the value of all objective functions of such the individual to 0, the individual are eliminated. The same content individual become the worst individual. After EoD, the mating does not reproduce the individual.

6 Verification of the proposed techniques

6.1 Double spiral problem

A double spiral problem is applied to verify an effectiveness of the proposed techniques. The double spiral problem is a problem of classifying two data sets arranged in a spiral shape, and it is known as a problem that is difficult to solve even using neural networks [6, 38]. These two data sets are arranged as shown in Fig. 9 and are to be classified by the following function f.

$$\begin{aligned} \left\{ \begin{array}{lll} f (x,y) > 0 &\quad \Longleftrightarrow &\quad (x,y) \in {\mathbf{D}}_1 ,\\ f (x,y) < 0 &\quad \Longleftrightarrow &\quad (x,y) \in {\mathbf{D}}_2 ,\\ f (x,y) = 0 &\quad \Longleftrightarrow &\quad {\mathrm{FALSE}} , \end{array}\right. \end{aligned}$$
(13)

where (xy) denotes the coordinates of each point on the two-dimensional plane, and \({\mathbf{D}}_1\) and \({\mathbf{D}}_2\) denote the data sets expressed with the red crosses and the blue circles shown in Fig. 9 respectively. In this paper, the case when \(f(x,y)=0\) is treated as classification failure at the point (xy).

Fig. 9
figure 9

Arrangement of two data sets for double spiral problem. The red cross denotes a point in the class \({\mathbf{D}}_1\) and the blue circle denotes a point in the class \({\mathbf{D}}_2\)

The following nodes are prepared as elements for constituting the classifying function f.

$$\begin{aligned} {\mathcal{N}}_N= \{+ , - , *, \div , \sin , \cos , \tan , {\mathrm{ifltz}}\} \end{aligned}$$
(14)
$$\begin{aligned} {\mathcal{N}}_T= \{x , y , {\mathrm{constant}}\} \end{aligned}$$
(15)

where \({\mathcal{N}}_N\) denotes a set of non-terminal node, \({\mathcal{N}}_T\) denotes a set of terminal node, and “\({\mathrm{ifltz}}\)” denotes a function with three arguments representing a conditional branch as follows,

$$\begin{aligned} {\mathrm{ifltz}}(a,b,c) \triangleq{ \text{``}}{\mathrm{if}} \ a<0\ {\mathrm{then}}\ b\ {\mathrm{else}}\ c{\text{''}} = \left\{ \begin{array}{ll} b &\quad (a<0) ,\\ c &\quad ({\mathrm{otherwise}}). \end{array}\right. \end{aligned}$$
(16)

The objective function, \(h_1\), according to the goodness of an individual is defined by the following equation.

$$\begin{aligned} h_1 ({\mathrm{indiv}}_i )&= {\mathrm{performance}}({\mathrm{root}}_i), \\&= \frac{1}{\left| {\mathbf{D}}_1\cup {\mathbf{D}}_2\right| } \sum _{k=1}^{\left| {\mathbf{D}}_1\cup {\mathbf{D}}_2\right| } g(x_k,y_k), \end{aligned}$$
(17)
$$\begin{aligned} g(x,y)= \left\{ \begin{array}{ll} 1 &\quad (f (x,y)> 0 \wedge (x,y) \in {\mathbf{D}}_1) ,\\ 0 &\quad (f (x,y) > 0 \wedge (x,y) \in {\mathbf{D}}_2) ,\\ 1 &\quad (f (x,y)< 0 \wedge (x,y) \in {\mathbf{D}}_2) ,\\ 0 &\quad (f (x,y) < 0 \wedge (x,y) \in {\mathbf{D}}_1) ,\\ 0 &\quad (f (x,y)=0). \end{array}\right. \end{aligned}$$
(18)

In this double spiral problem, the function, \({\mathrm{Size}}({\mathrm{root}}_i)\), applied for the objective function according to the size of tree structure is defined as the number of nodes of the tree structure.

In order to verify the effectiveness, the following four combinations are applied to the double spiral problem, combination of the conventional operators and CD (expressed as “’\({\mathrm{CO}}+{\mathrm{MU}}\) & CD”), combination of the conventional operators and SD (expressed as “CO+MU & SD”), combination of PS operator and CD (expressed as “PS & CD”) and combination of PS operator and SD (expressed as “PS & SD”). The conventional operators denotes the conventional crossover and the conventional mutations [13, 17, 18, 36]. The size of the population, \(N_{\mathrm{pop}}\), the running generations and the number of points in the double spiral problem, \(\left| {\mathbf{D}}_1\cup {\mathbf{D}}_2\right|\), are defined as 100, 1, 000, 000 and 190 respectively. The probability, \(p_{\mathrm{met}}\), for selecting the type of the metastasis is tried to 0.5, 0.25 and 0.75.

Figure 10 shows distributions on the \(h_2 - h_1\) plane of the first-front set in the final generation. As shown by Fig. 10, NSGA-II with combining PS operator and SD has given the best solution set, distributed in the upper right direction, in the widest range. The solutions obtained by NSGA-II with combining PS operator and CD has relatively high diversity but their evaluations are not so good. NSGA-II with combining the conventional operators and CD has given relatively good solutions but their diversity is low. NSGA-II with combining the conventional operators and SD has given the worst solution set with the lowest diversity.

Fig. 10
figure 10

Distribution on the \(h_2 - h_1\) plane of the first front set in the final generation when using each method

Fig. 11
figure 11

Comparison of distribution on the \(h_2-h_1\) plane of the first front set in the final generation when 3-objective and 2-objective optimizations are executed by using the PS operator with \(p_{\mathrm{met}}=0.50\) and SD for the ranking

Figure 11 shows a comparison of distribution on the \(h_2-h_1\) plane of the first front set in the final generation when 3-objective and 2-objective optimizations are executed by using the PS operator with \(p_{\mathrm{met}}=0.50\) and SD for the ranking. Compared to the distribution of solutions given by 2-objective optimization, the 3-objective optimization has acquired far better solutions in wider range. When PS operator with \(p_{\mathrm{met}}=0.50\) and CD are combined, the same result has been obtained as shown in Fig. 12. This shows an effectiveness of multi-objective optimization of the tree-structure data as proposed in this paper.

Fig. 12
figure 12

Comparison of distribution on the \(h_2 - h_1\) plane of the first front set in the final generation when 3-objective and 2-objective optimizations are executed by using the PS operator with \(p_{\mathrm{met}}=0.50\) and CD for the ranking

Figure 13 shows values of Norm [35] and MS [39] given by each method. In this figure, \({\mathrm{PS}}*.**\) denotes when PS operator with the metastasis selection probability, \(p_{\mathrm{met}}\), which is equal to \(*.**\) is used for the mating. Concerning both Norm and MS values, the best result has been obtained by the method using PS operator with \(p_{\mathrm{met}}=0.50\) and SD. The results using PS operator have gathered in the upper right of the figure, whereas the results using the conventional crossover and the conventional mutation have gathered in the lower left. This shows the superiority of PS operator. On the other hand, the advantage of SD has not been clearly shown by this experiment. SD have optimized relatively better only when combined with PS operator. NSGA-II even with SD has given the worst results when combined with the conventional operators. The reason for this result is considered as that SD has a low ability to preserve extreme solutions as CD does. In the case of the multi-objective optimization of the tree structure, the ability to retain the diversity of tree structures like the ranking with SD is necessary, then an improvement to add ability to preserve the extreme solutions like CD should be considered.

Fig. 13
figure 13

Comparison of results by each method on MS-Norm plane. \({\mathrm{CO}}+{\mathrm{MU}}\) denotes when the conventional crossover operator and the conventional mutation operators are used for the mating. \({\mathrm{PS}}*.**\) denotes when PS operator with the metastasis selection probability, \(p_{\mathrm{met}}\), which is equal to \(*.**\), is used for the mating

6.2 Many-objective 0/1 knapsack problem

In order to verify the effectiveness of the improved technique, a many-objective 0/1 knapsack Problem (MaOKSP) [42] is performed. MaOKSP composed of m knapsacks and j items. The capacity of the i-th knapsack is \(c_i\). The weight and the price of the j-th item are \(w_{ij}\) and \(p_{ij}\) respectively in the i-th knapsack. Let an individual \({\mathbf{x}} \in {0,1}^n\) be the n dimensional vector that selects the items. MaOKSP is defined by the following formula.

$$\begin{aligned}&\left\{ \begin{array}{ll} {\mathrm{max.}} &\quad {\mathbf{f}}({\mathbf{x}}) = \left[ f_{1}({\mathbf{x}}), f_{2}({\mathbf{x}}), \ldots , f_{m}({\mathbf{x}}) \right] ^T \\ \text{ s.t. } &\quad \sum _{j=1} ^{n} {w_{ij} \cdot x_{j}} \leqq c_i \end{array},\right. \end{aligned}$$
(19)
$$\begin{aligned}&\quad f_i({\mathbf{x}}) = \sum _{j=1} ^{n} {p_{ij} \cdot x_{j}} \quad {\mathrm{for}} \,\; i = 1,2,\ldots ,m, \end{aligned}$$
(20)

where the number of items n, the weight matrix \(w_{ij}\), the profit matrix \(p_{ij}\) and the knapsack capacity vector \(c_i\) are defined as follows,

$$\begin{aligned} n= 1000, \end{aligned}$$
(21)
$$\begin{aligned} w_{ij}\in (0, 1) \subset {\mathbf{R}}, \end{aligned}$$
(22)
$$\begin{aligned} p_{ij}\in (0, 1) \subset {\mathbf{R}}, \end{aligned}$$
(23)
$$\begin{aligned} c_{i}\in (0, 100) \subset {\mathbf{R}}. \end{aligned}$$
(24)

\({\mathcal{POS}}\) obtained by the optimization is evaluated by using Maximum Spread (MS) [39] and Norm [35].

Norm [35] and Maximum Spread (MS) [39] are applied for evaluation of each method. Norm denotes a measure of the convergence of the population to the Pareto front \({\mathcal{PF}}\) and is defined by the following equation.

$$\begin{aligned} \text{ Norm }({\mathcal{PF}}) = \frac{1}{ \left| {\mathcal{PF}}\right| } \sum _{j=1} ^{\left| {\mathcal{PF}}\right| } \sqrt{ \sum _{i=1} ^{m} f_{i}({\mathbf{x}}_{j})^2}, \end{aligned}$$
(25)

where \({\mathbf{x}}_{j}\) denotes the j-th individual of the Pareto front, \({\mathcal{PF}}\). The larger the Norm value, the closer the approximate Pareto front, \({\mathcal{PF}}\). MS denotes a measure of the spread of the first front at the final generation [39] and is defined by the following equation.

$$\begin{aligned} \text{ MS }({\mathcal{PF}}) = \sqrt{ \sum _{i=1} ^{m} \left( \text{ max }_{j=1}^{\left| {\mathcal{PF}}\right| } f_{i}({\mathbf{x}}_j) -\text{ min }_{j=1}^{\left| {\mathcal{PF}}\right| } f_{i}({\mathbf{x}}_j) \right) ^{2} } . \end{aligned}$$
(26)

The larger the MS value, the wider the spread of the population given by the optimization.

The conventional NSGA-II, NSGA-II-PPD when \(r=3\), \(r=6\) and \(r=8\), NSGA-II-PPD in the case of giving the combination list shown in Table 1 and the improved technique are carried out for the verification. The optimization is performed by setting the objective number to \(m=4,6,8,10\) and the iterative generations to \(G=1{,}000{,}000\).

Table 1 The combination list for NSGA-II-PPD

Figure 14 shows transition of the number of individuals of the first-front according to the generation in the case that \(m=10\) and \(I_g = 500\). In the figure, “NSGA-II” denotes the results by the conventional NSGA-II, “PPD(\(\hbox {r}=^{*}\))” denotes the results by NSGA-II-PPD with the constant value of \(r=*\), “PPD(list)” denotes the results by NSGA-II based on Pareto partial dominance with the combination list shown in Table 1, and “Improved” denotes the results by the algorithm proposed in this paper. The conventional NSGA-II and NSGA-II-PPD in \(r=8\) has given large number of the individuals of the first-front set throughout the optimization. NSGA-II-PPD with \(r=6\) has given the number next to them. At the end of the optimization, the improved technique has caught up with these values. NSGA-II-PPD with the combination list is also similar.

Fig. 14
figure 14

Transition of the number of individuals of the first-front according to the generation

Figure 15 shows Norm values after the optimization to the objective number m in the case that \(I_g = 500\). In any technique, the convergence to \({\mathcal{POS}}\) increases as the number of objectives increases. Although, regarding to the convergence, NSGA-II-PPD in the case that \(r=3\), NSGA-II-PPD with the combination list and the improved technique have given almost equivalent results, the conventional NSGA-II has given relatively poor results.

Fig. 15
figure 15

Comparison of Norm values to the object number m

Figure 16 shows MS values after the optimization to the objective number m in the case that \(I_g = 500\). The MS value, or the diversity of \({\mathcal{POS}}\), given by NSGA-II-PPD in the case that \(r=3\) decreases as the objective number increases, whereas it increases with the other three techniques. In the improved technique, since r increases as the generation progresses, the superiority/inferiority relationship of solutions becomes difficult to decide by Pareto partial dominance at the end of the optimization, and many individuals belong to the first-front set. As a result, since most individuals of the parents are ranked by the CD sorting, and it is considered that diversity has increased. NSGA-II-PPD with the combination list has shown diversity equal to or less than that of the improved technique. The reason that sufficient diversity has not been obtained by NSGA-II-PPD in the case that \(r=3\) is considered as because partial dominance by using all objectives has not been performed only between 900,000 and 1 million generations. Regarding the diversity of solutions, the conventional NSGA-II has given the highest value.

Fig. 16
figure 16

Comparison of MS values to the object number m

Figure 17 shows Norm values to the generation g in the case that \(m=10\) and \(I_g = 500\). In NSGA-II-PPD, the convergence to \({\mathcal{POS}}\) tends to decrease as the value of the parameter r increases. In this technique, when r approaches m, the solutions are hard to dominated by the partial dominance, so a large number of individuals are selected as the first-front set. As a result, sufficient ranking is not made in the non-dominated sorting, and the convergence has deteriorated. On the other hand, although the improved technique has shown the highest convergence at the beginning of the optimization, the convergence has declined at the final stage. In the improved technique, since the value of r increases as the generation progresses, the solutions become hard to dominated by the partial dominance. As a result, sufficient ranking is not made in the non-dominated sorting, and the convergence has deteriorated in the final stage.

Fig. 17
figure 17

Comparison of the Norm values to the generation g

Figure 18 shows MS values to the generation g in the case that \(m=10\) and \(I_g = 500\). Although the diversity in the cases of the conventional NSGA-II and NSGA-II-PPD in \(r=8\), maintains a high value throughout, the convergence is low as shown in Fig. 17, so it is not necessary to pay attention to them. On the other hand, the diversity is rising as the optimization progress in the case of the improved technique. Moreover, the improved technique brings relatively high convergence as shown in Fig. 17, so that the superiority of the improved technique is shown overall.

Fig. 18
figure 18

Comparison of the MS values to the generation g

7 Conclusion

In this paper, multi-objective optimization of tree-structure data, or MOGP, has been proposed, where the tree structure size and the structural distance (SD) are additionally introduced into the measure of the goodness of the tree structure as the objective functions. Furthermore, the partial sampling (PS) operator has been proposed to effectively search tree structure while avoiding the bloat. In order to verify the effectiveness of the proposed techniques, they have applied to the double spiral problem. By means of the multi-objective optimization of tree-structure data, we have found that more diverse and better tree structures are acquired. The proposed method incorporating PS operator and SD in NSGA-II has given relatively good results. However, since PS operator has low ability to numerically optimize constant nodes on the tree structure, it has not well worked effectively for the function optimization. In addition, since ranking with SD in NSGA-II has low ability to preserve extreme solutions in the objective function space, solutions not have been effectively selected.

When the index of the goodness of tree-structure data becomes two or more, the number of objective functions in MOGP becomes four or more, MaOGP. The improved NSGA-II-PPD applicable to such the MaOGP has been also proposed in this paper. In the improvement, we have proposed SSS and EoD.

The improved NSGA-II-PPD with SSS and EoD and other conventional techniques are applied to the many-objective 0/1 knapsack problem for verification of the effectiveness. The improved NSGA-II-PPD has given the higher diversity than other techniques as the number of the objective functions of the problem increases. On the other hand, the improved NSGA-II-PPD has given the convergence equal to or higher than the other techniques even when the number of the objective functions becomes large. By means of the proposed simple scheduling of the parameter r, sufficient convergence has been obtained in the early generations with the smaller r, and the diversity has been supplemented in the generations with the larger r at the end of the optimization.

In the future, a technique to incorporate numerical optimization ability such as a particle swarm optimization [16] and the mutation to PS operator and the ranking selection technique combining SD and CD should be considered in the future. The PS operator proposed in this paper has a mechanism to terminate the proliferation, but does not have no mechanism to forcibly exit from the PS operator. Such the mechanism to forcibly exit from the PS operator should be considered.

Since the improved NSGA-II-PPD still has given insufficient results in terms of the diversity, we need to improve this point while maintaining the current convergence. Although each technique has been applied to the relatively simple many-objective 0/1 knapsack problem in this paper, we need to apply to more complicated problems and verify the effectiveness. We also need to pursue the combination list and to compare the further improved NSGA-II-PPD and the conventional NSGA-II-PPD with the optimal combination list. And then, we need to propose an effective MaOGP by combining these improved techniques in the future.

In this paper, the quality indicator MS is applied to assess the diversity of the final solutions. However, MS is able to simply be affected by the convergence of the solutions, in favor of poorly-converged solutions. In this sense, MS is not necessarily effective for the assessment of the diversity. In the future research, it is necessary to consider techniques such as a diversity comparison indicator (DCI) [22] to assess the diversity of the solutions. On the other hand, the convergence of the solutions has been evaluated only using Norm. In this regard, the future research needs to visualize solutions in multi-objective optimization with parallel coordinates [24], which can partially reflect the convergence, spread and uniformity.