Nonlinear speech coding model based on genetic programming

doi:10.1016/j.asoc.2013.02.008

Applied Soft Computing

Volume 13, Issue 7, July 2013, Pages 3314-3323

https://doi.org/10.1016/j.asoc.2013.02.008 Get rights and content

Abstract

An improved genetic programming is proposed in this paper to construct the nonlinear models of speech signals, and the speech coding is further accomplished. After the preprocessing of the speech signals, the improved GP is used to construct the corresponding model of each speech frame. Then by analyzing these models, a normalized model that has generalization ability is obtained. And finally the process of speech coding is accomplished by the optimizing the parameters of the normalized model using an optimization algorithm. Experiments demonstrate that the feasibility of the improved GP in the modeling of speech signals, and show the superiority of the proposed method in speech coding based on the comparisons with the linear predictive coding.

Graphical abstract

Highlights

► An improved genetic programming is proposed for speech modeling. ► We obtain a normalized nonlinear model which is effective for speech coding. ► A new process of speech coding is completed using an improved PSO (UPSO) algorithm. ► We provide a novel method for nonlinear speech processing.

Introduction

Speech coding is to transform the analog signals of speech into digital signals following some certain rules. The basic methods of speech coding are waveform coding [1], parametric coding [2] and hybrid coding [3]. Waveform coding makes effort to keep consistent with the original waveform, which has strong adaptability and high speech quality. However, the needed transmission bit rate is higher. Based on the analysis of the generation mechanism of speech, the parametric coding is accomplished by constructing the model of the generation mechanism under the principle that the decoded speech signals can be well understood. This method does not need to match the original waveform, which makes it have a lower transmission bit rate, but is more sensitive to the environmental noise and the synthesized speech has poor quality relatively. Formant coding and linear predictive coding are the typical approaches of parametric coding, of which the ‘linear prediction’ is a commonly used technology in speech processing, which has been successfully used in the applications of speech recognition [4], speech coding [5], etc.

Deep researches show that the speech signals are time series, which are time-varying and contain lots of nonlinear characteristics [6], and the linear prediction cannot meet the demand of modern speech processing. With the development of nonlinear theories, a few approaches, like neural network, have been widely used in speech processing, and the nonlinear research has become a hotpot in the domain of speech processing [7]. In this paper, the nonlinear models of speech signals are constructed based on the genetic programming.

Genetic programming (GP) [8] is a special optimization algorithm developed from genetic algorithm (GA). The hierarchical structure is used in GP and the solutions of different problems are boiled down to the corresponding computer programs with some given constraints. GP can accomplish the collateral optimization of the structure and the parameters of the model, which makes it extensively used in the modeling of nonlinear systems [9], data analysis [10], etc.

In this paper, an improved genetic programming is proposed to construct the nonlinear speech models based on the nonlinear characteristics of speech signals. By analyzing these models, a normalized model that has generalization ability is obtained. And finally, the speech coding is accomplished by optimizing the parameters of the normalized model using an optimization algorithm. The second part introduces some related works; the third part gives a general description of the proposed speech processing method; in part 4 and 5, the improved GP is proposed and the implementation of the speech coding is described particularly; Experiments is done in part 6 to demonstrate the method proposed in this paper.

Section snippets

Linear predictive coding

Linear predictive coding (LPC) is based on the assumption of all-pole model of the speech signals, whose parameters are estimated under the principle of the least-square error in the time domain. LPC can preferably describe the spectrum of the speech and the characteristics of the vocal tract, and also can reduce the kbps of speech coding. The structure of LPC model is as follows, $s (n) = \sum_{i = 1}^{p} a_{i} s (n - i) + G u (n)$ where G is the gain; u(n) is the excitation, which is the unit pulse sequence when s(n) is

Proposed speech coding method

Discrete speech signals are nonlinear time series, and the samples are correlated with their neighbors. The traditional LPC model also indicates this phenomenon. Actually, the analysis of the speech signals shows that the largest correlation value exists between the adjacent samples. When the sample has a sampling rate of 8 kHz, the correlation value of the adjacent samples is larger than 0.85. Even there are 10 samples apart from one to another; the correlation value between them also has a

Improved genetic programming

In the evolutionary process, GP can accomplish the collateral optimization of the structure and the parameters of the model. But the optimization of the structure is more focused on, and the optimization ability of the parameters is limited. In the improved GP proposed in this paper, the thought of hill-climbing algorithm is introduced to improve the optimization ability of the parameters. Moreover, considering the particularity of speech signals, the structure of individuals and the fitness

The implementation of speech coding

The speech coding develops to compress the speech signals in the transmission process. In the traditional LPC, the model structure is fixed, and the coding process is implemented only by optimizing the corresponding parameters of different frames.

In this paper, after the pre-processing of the speech signals, GP is utilized to construct the model of each frame. Then by the analysis of the models, a normalized model that has generalization ability is obtained. And finally, the speech coding is

Experiments

The experiments of this research are accomplished based on different samples. The improved GP is used to construct the nonlinear models of the samples, and by the analysis of these models’ structures, a normalized model that has generalization ability is obtained. Then the DUPSO algorithm is used to get the corresponding optimal parameters of different frames to accomplish the process of speech coding.

The speech signals in the experiments are chosen from the corpus. During the preprocessing of

Conclusion and future work

Speech signals are nonlinear time series. In this paper, the GP is introduced to construct the nonlinear models of the speech signals, and is improved based on the characteristics of the speeches. The hill-climbing algorithm is used to optimize the parameters locally. By the analysis of the models gotten by the improved GP, a normalized nonlinear model which has generalization ability is obtained. And an improved PSO algorithm is utilized to optimize the parameters of the normalized model

Acknowledgments

This work reported in this paper was supported by the NSF of China (Grant no. 11172342), NSF of Shaanxi Province, China (Grant no. 2012JM8043) and Program for New Century Excellent Talents in University of Ministry of Education, China (Grant no. NCET-11-0674). The authors thank the referees for their valuable suggestions and comments.

References (23)

S. Scanzio et al.
Parallel implementation of artificial neural network training for speech recognition
Pattern Recognition Letters
(2010)
Y.-S. Lee et al.
Forecasting time series using a methodology based on autoregressive integrated moving average and genetic programming
Knowledge-Based System
(2011)
P. Zoran et al.
An adaptive waveform coding algorithm and its application in speech coding
Digital Signal Processing
(2012)
D.I. Hyun et al.
Improved phase parameter analysis and synthesis for parametric stereo audio coding
G. Wan et al.
Real-time speech coding and decoding for GSM system and its implements in VC
K. Kinoshita et al.
Suppression of late reverberation effect on speech signal using long-term multi-step linear prediction
IEEE Transactions on Audio, Speech and Language Processing
(2009)
Y. Cao et al.
Research on order-variable code exited linear prediction speech coding method
International Symposium on Computer Network and Multimedia Technology
(2009)
I. Kokkinos et al.
Nonlinear speech analysis using models for chaotic systems
IEEE Transaction on Speech and Audio Processing
(2005)
A. Max
Little mathematical foundations of nonlinear non-gaussian, and time-varying digital speech signal processing
J.R. Koza
Genetic Programming: On the Programming of Computers by Means of Natural Selection
(1994)

K. Yan Chan et al.

Modeling customer satisfaction for product development using genetic programming

Journal of Engineering Design

(2011)

Cited by (8)

Multiple response optimization: Analysis of genetic programming for symbolic regression and assessment of desirability functions
2019, Knowledge-Based Systems
Citation Excerpt :
The great advantage of GP when compared to other nonlinear problem-modeling techniques is the fact that GP can create models with low relative error, and it does not need previous knowledge of the behavior of the dependent and independent variables of the process. There are several applications of GP in problem modeling involving non-linear equations, emphasizing a greater use in forecasting time series [16]. An important hindrance in GP application in mathematical models building lies in the computational effort required [17,18].
Multiple responses optimization (MRO) consists in the search for the best settings in an problem with conflicting responses. MRO is performed following the steps: experimental design; experimental data gathering; mathematical models building; statistical validation of models; agglutination of the models responses in only one function to be optimized; optimization of agglutinated function; experimental validation of the best conditions. This work selected two MRO cases from literature aiming to compare two methods of mathematical models building and two agglutinating functions to assess the best one among the four possible combinations. The methods used in mathematical models building were the ordinary least squares performed in Minitab (v. 17) and genetic programming performed in Eureqa Formulize (v. 1.24.0). The assessment of the best method for building mathematical models was performed using the Akaike Information Criterion. The responses agglutination were performed using the desirability and modified desirability functions. In all MRO cases, the optimization step was performed by generalized reduced gradient method on Microsoft Excel^TM software. The average percentage distance between predicted and experimental results was used to both assess the best agglutination function and verify the effect of the method used in the building of the mathematical models about its fitness to estimate the best condition close to that one obtained on experimental validation step. The obtained results suggest as the better strategy for multiple responses optimization the use, jointly, of genetic programming to mathematical models building and the modified desirability function to responses agglutination.
A chaotic time series prediction model for speech signal encoding based on genetic programming
2016, Applied Soft Computing Journal
Citation Excerpt :
The encoding and decoding of speech signal can only transmit different parameters which will be decoded at the receiver. Wu and Yang [19] make improvements in two ways on the GP algorithm. First, in the initialization of the population, a variety of groups are used, in order to increase the diversity of solutions and improve the global search capability.
In this paper, a novel solving method for speech signal chaotic time series prediction model was proposed. A phase space was reconstructed based on speech signal's chaotic characteristics and the genetic programming (GP) algorithm was introduced for solving the speech chaotic time series prediction models on the phase space with the embedding dimension m and time delay τ. And then, the speech signal's chaotic time series models were built. By standardized processing of these models and optimizing parameters, a speech signal's coding model of chaotic time series with certain generalization ability was obtained. At last, the experimental results showed that the proposed method can get the speech signal chaotic time series prediction models much more effectively, and had a better coding accuracy than linear predictive coding (LPC) algorithms and neural network model.
Detection of object boundary from point cloud by using multi-population based differential evolution algorithm
2023, Neural Computing and Applications
Weighted differential evolution algorithm for numerical function optimization: a comparative study with cuckoo search, artificial bee colony, adaptive differential evolution, and backtracking search optimization algorithms
2020, Neural Computing and Applications
Hidden phase space reconstruction: A novel chaotic time series prediction method for speech signals
2018, Chinese Journal of Electronics
A classification method for speech signal nonlinear prediction models
2016, Frontiers in Artificial Intelligence and Applications

View all citing articles on Scopus

View full text

Nonlinear speech coding model based on genetic programming

Abstract

Graphical abstract

Highlights

Introduction

Section snippets

Linear predictive coding

Proposed speech coding method

Improved genetic programming

The implementation of speech coding

Experiments

Conclusion and future work

Acknowledgments

Pattern Recognition Letters

Knowledge-Based System

An adaptive waveform coding algorithm and its application in speech coding

Digital Signal Processing

Improved phase parameter analysis and synthesis for parametric stereo audio coding

Real-time speech coding and decoding for GSM system and its implements in VC

Suppression of late reverberation effect on speech signal using long-term multi-step linear prediction

IEEE Transactions on Audio, Speech and Language Processing

Research on order-variable code exited linear prediction speech coding method

International Symposium on Computer Network and Multimedia Technology

Nonlinear speech analysis using models for chaotic systems

IEEE Transaction on Speech and Audio Processing

Little mathematical foundations of nonlinear non-gaussian, and time-varying digital speech signal processing

Genetic Programming: On the Programming of Computers by Means of Natural Selection

Modeling customer satisfaction for product development using genetic programming

Journal of Engineering Design