Nonlinear speech coding model based on genetic programming
Graphical abstract
Highlights
► An improved genetic programming is proposed for speech modeling. ► We obtain a normalized nonlinear model which is effective for speech coding. ► A new process of speech coding is completed using an improved PSO (UPSO) algorithm. ► We provide a novel method for nonlinear speech processing.
Introduction
Speech coding is to transform the analog signals of speech into digital signals following some certain rules. The basic methods of speech coding are waveform coding [1], parametric coding [2] and hybrid coding [3]. Waveform coding makes effort to keep consistent with the original waveform, which has strong adaptability and high speech quality. However, the needed transmission bit rate is higher. Based on the analysis of the generation mechanism of speech, the parametric coding is accomplished by constructing the model of the generation mechanism under the principle that the decoded speech signals can be well understood. This method does not need to match the original waveform, which makes it have a lower transmission bit rate, but is more sensitive to the environmental noise and the synthesized speech has poor quality relatively. Formant coding and linear predictive coding are the typical approaches of parametric coding, of which the ‘linear prediction’ is a commonly used technology in speech processing, which has been successfully used in the applications of speech recognition [4], speech coding [5], etc.
Deep researches show that the speech signals are time series, which are time-varying and contain lots of nonlinear characteristics [6], and the linear prediction cannot meet the demand of modern speech processing. With the development of nonlinear theories, a few approaches, like neural network, have been widely used in speech processing, and the nonlinear research has become a hotpot in the domain of speech processing [7]. In this paper, the nonlinear models of speech signals are constructed based on the genetic programming.
Genetic programming (GP) [8] is a special optimization algorithm developed from genetic algorithm (GA). The hierarchical structure is used in GP and the solutions of different problems are boiled down to the corresponding computer programs with some given constraints. GP can accomplish the collateral optimization of the structure and the parameters of the model, which makes it extensively used in the modeling of nonlinear systems [9], data analysis [10], etc.
In this paper, an improved genetic programming is proposed to construct the nonlinear speech models based on the nonlinear characteristics of speech signals. By analyzing these models, a normalized model that has generalization ability is obtained. And finally, the speech coding is accomplished by optimizing the parameters of the normalized model using an optimization algorithm. The second part introduces some related works; the third part gives a general description of the proposed speech processing method; in part 4 and 5, the improved GP is proposed and the implementation of the speech coding is described particularly; Experiments is done in part 6 to demonstrate the method proposed in this paper.
Section snippets
Linear predictive coding
Linear predictive coding (LPC) is based on the assumption of all-pole model of the speech signals, whose parameters are estimated under the principle of the least-square error in the time domain. LPC can preferably describe the spectrum of the speech and the characteristics of the vocal tract, and also can reduce the kbps of speech coding. The structure of LPC model is as follows,where G is the gain; u(n) is the excitation, which is the unit pulse sequence when s(n) is
Proposed speech coding method
Discrete speech signals are nonlinear time series, and the samples are correlated with their neighbors. The traditional LPC model also indicates this phenomenon. Actually, the analysis of the speech signals shows that the largest correlation value exists between the adjacent samples. When the sample has a sampling rate of 8 kHz, the correlation value of the adjacent samples is larger than 0.85. Even there are 10 samples apart from one to another; the correlation value between them also has a
Improved genetic programming
In the evolutionary process, GP can accomplish the collateral optimization of the structure and the parameters of the model. But the optimization of the structure is more focused on, and the optimization ability of the parameters is limited. In the improved GP proposed in this paper, the thought of hill-climbing algorithm is introduced to improve the optimization ability of the parameters. Moreover, considering the particularity of speech signals, the structure of individuals and the fitness
The implementation of speech coding
The speech coding develops to compress the speech signals in the transmission process. In the traditional LPC, the model structure is fixed, and the coding process is implemented only by optimizing the corresponding parameters of different frames.
In this paper, after the pre-processing of the speech signals, GP is utilized to construct the model of each frame. Then by the analysis of the models, a normalized model that has generalization ability is obtained. And finally, the speech coding is
Experiments
The experiments of this research are accomplished based on different samples. The improved GP is used to construct the nonlinear models of the samples, and by the analysis of these models’ structures, a normalized model that has generalization ability is obtained. Then the DUPSO algorithm is used to get the corresponding optimal parameters of different frames to accomplish the process of speech coding.
The speech signals in the experiments are chosen from the corpus. During the preprocessing of
Conclusion and future work
Speech signals are nonlinear time series. In this paper, the GP is introduced to construct the nonlinear models of the speech signals, and is improved based on the characteristics of the speeches. The hill-climbing algorithm is used to optimize the parameters locally. By the analysis of the models gotten by the improved GP, a normalized nonlinear model which has generalization ability is obtained. And an improved PSO algorithm is utilized to optimize the parameters of the normalized model
Acknowledgments
This work reported in this paper was supported by the NSF of China (Grant no. 11172342), NSF of Shaanxi Province, China (Grant no. 2012JM8043) and Program for New Century Excellent Talents in University of Ministry of Education, China (Grant no. NCET-11-0674). The authors thank the referees for their valuable suggestions and comments.
References (23)
- et al.
Parallel implementation of artificial neural network training for speech recognition
Pattern Recognition Letters
(2010) - et al.
Forecasting time series using a methodology based on autoregressive integrated moving average and genetic programming
Knowledge-Based System
(2011) - et al.
An adaptive waveform coding algorithm and its application in speech coding
Digital Signal Processing
(2012) - et al.
Improved phase parameter analysis and synthesis for parametric stereo audio coding
- et al.
Real-time speech coding and decoding for GSM system and its implements in VC
- et al.
Suppression of late reverberation effect on speech signal using long-term multi-step linear prediction
IEEE Transactions on Audio, Speech and Language Processing
(2009) - et al.
Research on order-variable code exited linear prediction speech coding method
International Symposium on Computer Network and Multimedia Technology
(2009) - et al.
Nonlinear speech analysis using models for chaotic systems
IEEE Transaction on Speech and Audio Processing
(2005) Little mathematical foundations of nonlinear non-gaussian, and time-varying digital speech signal processing
Genetic Programming: On the Programming of Computers by Means of Natural Selection
(1994)
Modeling customer satisfaction for product development using genetic programming
Journal of Engineering Design
Cited by (8)
Multiple response optimization: Analysis of genetic programming for symbolic regression and assessment of desirability functions
2019, Knowledge-Based SystemsCitation Excerpt :The great advantage of GP when compared to other nonlinear problem-modeling techniques is the fact that GP can create models with low relative error, and it does not need previous knowledge of the behavior of the dependent and independent variables of the process. There are several applications of GP in problem modeling involving non-linear equations, emphasizing a greater use in forecasting time series [16]. An important hindrance in GP application in mathematical models building lies in the computational effort required [17,18].
A chaotic time series prediction model for speech signal encoding based on genetic programming
2016, Applied Soft Computing JournalCitation Excerpt :The encoding and decoding of speech signal can only transmit different parameters which will be decoded at the receiver. Wu and Yang [19] make improvements in two ways on the GP algorithm. First, in the initialization of the population, a variety of groups are used, in order to increase the diversity of solutions and improve the global search capability.
Detection of object boundary from point cloud by using multi-population based differential evolution algorithm
2023, Neural Computing and ApplicationsHidden phase space reconstruction: A novel chaotic time series prediction method for speech signals
2018, Chinese Journal of ElectronicsA classification method for speech signal nonlinear prediction models
2016, Frontiers in Artificial Intelligence and Applications