Elsevier

Applied Soft Computing

Volume 38, January 2016, Pages 754-761
Applied Soft Computing

A chaotic time series prediction model for speech signal encoding based on genetic programming

https://doi.org/10.1016/j.asoc.2015.10.003Get rights and content

Highlights

  • We get a speech signal model on a reconstructed phase space by chaotic time series.

  • We use GP algorithm and model standardization for speech signal preprocessing.

  • We get a set of explicit expression models for model analysis and classification.

  • A standard nonlinear model for the selected samples has been obtained.

Abstract

In this paper, a novel solving method for speech signal chaotic time series prediction model was proposed. A phase space was reconstructed based on speech signal's chaotic characteristics and the genetic programming (GP) algorithm was introduced for solving the speech chaotic time series prediction models on the phase space with the embedding dimension m and time delay τ. And then, the speech signal's chaotic time series models were built. By standardized processing of these models and optimizing parameters, a speech signal's coding model of chaotic time series with certain generalization ability was obtained. At last, the experimental results showed that the proposed method can get the speech signal chaotic time series prediction models much more effectively, and had a better coding accuracy than linear predictive coding (LPC) algorithms and neural network model.

Introduction

The Linear Prediction Coding (LPC) has been widely used in speech signal processing field. It is based on the correlation between the speech samples, using a linear model, with the past p sample values to predict the sample values present or future. Then, a large number of studies on non-linear characteristics of speech signal found that speech signal and the discrete speech signal sequence showed a very complicated nonlinear process and obvious chaotic characteristics [1], [2], [3], [4]. Nonlinear science inspired the study of speech signal which can be analyzed by non-linear methods. The research of nonlinear models for speech signal coding became a hot spot of speech signal processing, and using chaos theory to research speech signal has made some achievements [4], [5], [6]. It is proved that the reconstructed speech signal helped to analyze its characteristics deeply and accurately [6]. Therefore, the introduction of chaotic time series analysis theory has very important theoretical and practical values for not only researching the chaos characteristics and the processing methods of speech signal, but also constructing nonlinear speech signal processing models.

The neural network model is the most commonly used method for nonlinear speech signal processing, and has some achievements [7], [8], [9]. It has been applied to speech recognition [10], speech transmission [11], etc.

Thyssen et al. [12] hold that the speech signal was the combination of linear and nonlinear function. Linear prediction method has been used for modeling the linear part. They proved that the residual signal has obvious nonlinear characteristics, and used Volterra and neural network methods to modeling the residual part which is nonlinear part of the signal. The combination of those linear models and nonlinear models has improved the accuracy of coding. Atthew K. Luka established a local ethnic languages recognition system through the neural network technology [13]. This system uses Mel-frequency Cepstral Coefficients of the speech signal that extract from Mel filter banks as the input of the multilayer neural network. They combined the conjugate gradient backward propagation algorithm with neural network to improving the convergence rate, which shows good recognition ability. Using chaos theory to analyze speech signal, Lee and Tong [14] and Aina et al. [15] applied the traditional local linear and radial neural network to the analysis of chaotic time sequence and achieved two kinds of nonlinear speech coding predictors. The result showed that the predictor established in the reconstructed phase space had significantly improved compared with the linear predictor.

The neural network model has been successfully used in the speech signal coding and also provides a structured prediction model. In recent years, it has made some achievements in the nonlinear prediction of the speech signal sequence. But the structure of the neural network model is based on the optimization of the weights; it is not conducive to the speech signal's analysis and processing as it cannot provide the fixed structure for the models to describe the speech signal like LPC, and it needs to rebuild for different samples. So its practical application has been limited. To solve these problems, this paper introduces GP (Genetic Programming) as solving method for the speech signal prediction models. Compared to neural network method, GP can obtain the explicit model structure which can facilitate its analysis and application. As early as 1998, Conrads et al. [16] used GP in speech; they found that GP could find programs to discriminate certain spoken vowels and consonants. Xie and Zhang [17], [18] did a series of research in rhythmic stress detection in spoken English using GP. Later, GP was proposed to construct nonlinear speech coding models [19].

GP is a special optimization algorithm developing from GA (Genetic Algorithm) [20]. It uses hierarchical structure to describe programming problems. Issues in different fields are attributed to computer program that searches for solutions to meet fixed constraint conditions. GP algorithm can take advantage of limited samples to discover parallel estimate of the model structure and its parameters. In recent years, it has been widely used in nonlinear system modeling [21], data analysis [22] and other areas.

Based on the following four reasons GP can be used to solve the problem of speech signal processing. Firstly, compared to LPC, GP highlights the non-linear characteristics. Secondly, compared to the Artificial Neural Network, it can obtain the explicit model structure, which is facilitated for its analysis, optimization and application. Thirdly, compared to other non-linear models, using GP models, we can achieve the overall modeling without considering the linear or non-linear part. Finally, compared to the Volterra model, it overcame the weakly nonlinear problem of the models, and it is more streamlined and more efficient.

Considering chaotic characteristics of the speech signal, this paper provides an improved genetic programming algorithm to establish the non-linear model of the speech signal; through the model analysis, a model structure which has generalization ability can be chosen. By optimizing parameters, speech signal encoding could be achieved and it provides a new model for speech signal's nonlinear analysis.

Section snippets

GP algorithm for nonlinear modeling

Compared to other algorithms, the GP algorithm can better gain a nonlinear model with minimum error, and it does not need any prior knowledge. The evolutionary process changes the structure of the model automatically through crossover and mutation, calculates the fitness of new individuals by the constructed fitness function and keeps the optimal one, to make the evolution direction toward the better target. As the model that is produced by GP algorithm has explicit structure, it can

Proposed speech coding method

The speech signal is a chaotic system that produces orders and laws from disorder and complexity. The chaotic attractor is one of the chaotic characteristics indicating the regularity of a chaotic system. The inherent regularity of the chaotic system shows that it is predictable. This paper uses chaos system's internal regularity, combined with nonlinear prediction method (GP) to predict and build models of the speech signal.

In this paper, the speech signal processing flow chart, as shown in

Prediction model of speech signal based on improved GP algorithm

As using the traditional linear prediction model for speech coding, the model structure is fixed. The encoding and decoding of speech signal can only transmit different parameters which will be decoded at the receiver.

Wu and Yang [19] make improvements in two ways on the GP algorithm. First, in the initialization of the population, a variety of groups are used, in order to increase the diversity of solutions and improve the global search capability. Second, the hill-climbing algorithm is

Experiments

In our experiment, the samples consist of different recording pronunciations of the given English phonemes and words with the sampling rate 8 kHz, and the data linear quantizing by the 8 bits PCM. The environmental factors are strictly controlled during the recording process. So the collected speech signal has a higher signal to noise ratio.

During the experimental process, according to different speech samples, the improved GP algorithm is used to establish the nonlinear model of speech signal.

Conclusion

Speech signal is a chaotic system which can find order and regularity from the disorder and complexity. In this paper, we reconstructed the phase space of speech signal time series. The construction is based on the chaotic nature of speech signal. In order to find solutions for the model of such speech signal chaotic time series, we introduced the GP algorithm with hill-climbing algorithm. That also can improve the model structure optimization and parameter optimization capabilities. Then we

Acknowledgements

This work reported in this paper was supported by the NSF of China (Grant no. 11172342, 11372167, 11502133), the Program of Shaanxi Science and Technology Innovation Team (Grant no. 2014KTC-18), and the Shaanxi Natural Science Foundation Project (2014JM8353). The authors thank the referees for their valuable suggestions and comments

References (31)

  • M. Ogorzalek

    Signal coding and compression based on discrete-time chaos: statistical approaches

    IEEE Int. Symp. Circuits Syst.

    (2002)
  • M.L. Seltzer et al.

    An investigation of deep neural networks for noise robust speech recognition

  • A. Miranian et al.

    Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series prediction

    IEEE Trans. Neural Netw. Learn. Syst.

    (2013)
  • Y. Naniwa et al.

    Study on the artificial synthesis of human voice using radial basis function networks

    Advanced Methods, Techniques, and Applications in Modeling and Simulation

    (2012)
  • S. Scanzio et al.

    Parallel implementation of artificial neural network training

  • Cited by (9)

    View all citing articles on Scopus
    View full text