A chaotic time series prediction model for speech signal encoding based on genetic programming
Graphical abstract
Introduction
The Linear Prediction Coding (LPC) has been widely used in speech signal processing field. It is based on the correlation between the speech samples, using a linear model, with the past p sample values to predict the sample values present or future. Then, a large number of studies on non-linear characteristics of speech signal found that speech signal and the discrete speech signal sequence showed a very complicated nonlinear process and obvious chaotic characteristics [1], [2], [3], [4]. Nonlinear science inspired the study of speech signal which can be analyzed by non-linear methods. The research of nonlinear models for speech signal coding became a hot spot of speech signal processing, and using chaos theory to research speech signal has made some achievements [4], [5], [6]. It is proved that the reconstructed speech signal helped to analyze its characteristics deeply and accurately [6]. Therefore, the introduction of chaotic time series analysis theory has very important theoretical and practical values for not only researching the chaos characteristics and the processing methods of speech signal, but also constructing nonlinear speech signal processing models.
The neural network model is the most commonly used method for nonlinear speech signal processing, and has some achievements [7], [8], [9]. It has been applied to speech recognition [10], speech transmission [11], etc.
Thyssen et al. [12] hold that the speech signal was the combination of linear and nonlinear function. Linear prediction method has been used for modeling the linear part. They proved that the residual signal has obvious nonlinear characteristics, and used Volterra and neural network methods to modeling the residual part which is nonlinear part of the signal. The combination of those linear models and nonlinear models has improved the accuracy of coding. Atthew K. Luka established a local ethnic languages recognition system through the neural network technology [13]. This system uses Mel-frequency Cepstral Coefficients of the speech signal that extract from Mel filter banks as the input of the multilayer neural network. They combined the conjugate gradient backward propagation algorithm with neural network to improving the convergence rate, which shows good recognition ability. Using chaos theory to analyze speech signal, Lee and Tong [14] and Aina et al. [15] applied the traditional local linear and radial neural network to the analysis of chaotic time sequence and achieved two kinds of nonlinear speech coding predictors. The result showed that the predictor established in the reconstructed phase space had significantly improved compared with the linear predictor.
The neural network model has been successfully used in the speech signal coding and also provides a structured prediction model. In recent years, it has made some achievements in the nonlinear prediction of the speech signal sequence. But the structure of the neural network model is based on the optimization of the weights; it is not conducive to the speech signal's analysis and processing as it cannot provide the fixed structure for the models to describe the speech signal like LPC, and it needs to rebuild for different samples. So its practical application has been limited. To solve these problems, this paper introduces GP (Genetic Programming) as solving method for the speech signal prediction models. Compared to neural network method, GP can obtain the explicit model structure which can facilitate its analysis and application. As early as 1998, Conrads et al. [16] used GP in speech; they found that GP could find programs to discriminate certain spoken vowels and consonants. Xie and Zhang [17], [18] did a series of research in rhythmic stress detection in spoken English using GP. Later, GP was proposed to construct nonlinear speech coding models [19].
GP is a special optimization algorithm developing from GA (Genetic Algorithm) [20]. It uses hierarchical structure to describe programming problems. Issues in different fields are attributed to computer program that searches for solutions to meet fixed constraint conditions. GP algorithm can take advantage of limited samples to discover parallel estimate of the model structure and its parameters. In recent years, it has been widely used in nonlinear system modeling [21], data analysis [22] and other areas.
Based on the following four reasons GP can be used to solve the problem of speech signal processing. Firstly, compared to LPC, GP highlights the non-linear characteristics. Secondly, compared to the Artificial Neural Network, it can obtain the explicit model structure, which is facilitated for its analysis, optimization and application. Thirdly, compared to other non-linear models, using GP models, we can achieve the overall modeling without considering the linear or non-linear part. Finally, compared to the Volterra model, it overcame the weakly nonlinear problem of the models, and it is more streamlined and more efficient.
Considering chaotic characteristics of the speech signal, this paper provides an improved genetic programming algorithm to establish the non-linear model of the speech signal; through the model analysis, a model structure which has generalization ability can be chosen. By optimizing parameters, speech signal encoding could be achieved and it provides a new model for speech signal's nonlinear analysis.
Section snippets
GP algorithm for nonlinear modeling
Compared to other algorithms, the GP algorithm can better gain a nonlinear model with minimum error, and it does not need any prior knowledge. The evolutionary process changes the structure of the model automatically through crossover and mutation, calculates the fitness of new individuals by the constructed fitness function and keeps the optimal one, to make the evolution direction toward the better target. As the model that is produced by GP algorithm has explicit structure, it can
Proposed speech coding method
The speech signal is a chaotic system that produces orders and laws from disorder and complexity. The chaotic attractor is one of the chaotic characteristics indicating the regularity of a chaotic system. The inherent regularity of the chaotic system shows that it is predictable. This paper uses chaos system's internal regularity, combined with nonlinear prediction method (GP) to predict and build models of the speech signal.
In this paper, the speech signal processing flow chart, as shown in
Prediction model of speech signal based on improved GP algorithm
As using the traditional linear prediction model for speech coding, the model structure is fixed. The encoding and decoding of speech signal can only transmit different parameters which will be decoded at the receiver.
Wu and Yang [19] make improvements in two ways on the GP algorithm. First, in the initialization of the population, a variety of groups are used, in order to increase the diversity of solutions and improve the global search capability. Second, the hill-climbing algorithm is
Experiments
In our experiment, the samples consist of different recording pronunciations of the given English phonemes and words with the sampling rate 8 kHz, and the data linear quantizing by the 8 bits PCM. The environmental factors are strictly controlled during the recording process. So the collected speech signal has a higher signal to noise ratio.
During the experimental process, according to different speech samples, the improved GP algorithm is used to establish the nonlinear model of speech signal.
Conclusion
Speech signal is a chaotic system which can find order and regularity from the disorder and complexity. In this paper, we reconstructed the phase space of speech signal time series. The construction is based on the chaotic nature of speech signal. In order to find solutions for the model of such speech signal chaotic time series, we introduced the GP algorithm with hill-climbing algorithm. That also can improve the model structure optimization and parameter optimization capabilities. Then we
Acknowledgements
This work reported in this paper was supported by the NSF of China (Grant no. 11172342, 11372167, 11502133), the Program of Shaanxi Science and Technology Innovation Team (Grant no. 2014KTC-18), and the Shaanxi Natural Science Foundation Project (2014JM8353). The authors thank the referees for their valuable suggestions and comments
References (31)
- et al.
Enhancement of Chinese speech based on nonlinear dynamics
Signal Process.
(2007) - et al.
Forecasting time series using a methodology based on autoregressive integrated moving average and genetic programming
Know. Based Syst.
(2011) - et al.
Nonlinear speech coding model based on genetic programming
Appl. Soft Comput.
(2013) Practical method for determining the minimum embedding dimension of a scalar time series
Phys. D: Nonlinear Phenom.
(1997)- et al.
Nonlinear dynamics, delay times, and embedding windows
Phys. D: Nonlinear Phenom.
(1999) - et al.
A practical method for calculating largest Lyapunov exponents from small data sets
Phys. D: Nonlinear Phenom.
(1993) - et al.
Chaotic-type features for speech steganalysis
IEEE Trans. Inf. Forensics Security
(2008) - et al.
Nonlinear dynamical analysis of normal voices
- et al.
The nonlinear characteristics of the speech signal
J. PLA Univ. Sci. Technol.
(2000) - et al.
Nonlinear speech analysis using models for chaotic systems
IEEE Trans. Speech Audio Process.
(2005)
Signal coding and compression based on discrete-time chaos: statistical approaches
IEEE Int. Symp. Circuits Syst.
An investigation of deep neural networks for noise robust speech recognition
Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series prediction
IEEE Trans. Neural Netw. Learn. Syst.
Study on the artificial synthesis of human voice using radial basis function networks
Advanced Methods, Techniques, and Applications in Modeling and Simulation
Parallel implementation of artificial neural network training
Cited by (9)
A multi-scale prediction model based on empirical mode decomposition and chaos theory for industrial melt index prediction
2019, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :Prediction of chaotic time series is a useful method to evaluate the characteristics of dynamical systems and forecast the trend of complex systems [25]. With the development of chaos theory, chaotic time series prediction has been widely applied in the fields of financial time series prediction, electricity price prediction, power load prediction, traffic flow prediction, and so on [26–28]. In this paper, a multi-scale prediction model based on EMD and chaos theory is proposed for industrial MI prediction.
Hybrid Stochastic Genetic Evolution-Based Prediction Model of Received Input Voltage for Underground Imaging Applications
2023, 2023 8th International Conference on Business and Industrial Research, ICBIR 2023 - ProceedingsOn the Prediction of Chaotic Time Series using Neural Networks
2022, Chaos Theory and ApplicationsEcho state network based on improved fruit fly optimization algorithm for chaotic time series prediction
2022, Journal of Ambient Intelligence and Humanized ComputingA nonlinear prediction model for Chinese speech signal based on RBF neural network
2022, Multimedia Tools and ApplicationsImproved mining subsidence prediction model for high water level area using machine learning and chaos theory
2022, Energy Exploration and Exploitation