Elsevier

Applied Soft Computing

Volume 13, Issue 7, July 2013, Pages 3314-3323
Applied Soft Computing

Nonlinear speech coding model based on genetic programming

https://doi.org/10.1016/j.asoc.2013.02.008Get rights and content

Abstract

An improved genetic programming is proposed in this paper to construct the nonlinear models of speech signals, and the speech coding is further accomplished. After the preprocessing of the speech signals, the improved GP is used to construct the corresponding model of each speech frame. Then by analyzing these models, a normalized model that has generalization ability is obtained. And finally the process of speech coding is accomplished by the optimizing the parameters of the normalized model using an optimization algorithm. Experiments demonstrate that the feasibility of the improved GP in the modeling of speech signals, and show the superiority of the proposed method in speech coding based on the comparisons with the linear predictive coding.

Highlights

► An improved genetic programming is proposed for speech modeling. ► We obtain a normalized nonlinear model which is effective for speech coding. ► A new process of speech coding is completed using an improved PSO (UPSO) algorithm. ► We provide a novel method for nonlinear speech processing.

Introduction

Speech coding is to transform the analog signals of speech into digital signals following some certain rules. The basic methods of speech coding are waveform coding [1], parametric coding [2] and hybrid coding [3]. Waveform coding makes effort to keep consistent with the original waveform, which has strong adaptability and high speech quality. However, the needed transmission bit rate is higher. Based on the analysis of the generation mechanism of speech, the parametric coding is accomplished by constructing the model of the generation mechanism under the principle that the decoded speech signals can be well understood. This method does not need to match the original waveform, which makes it have a lower transmission bit rate, but is more sensitive to the environmental noise and the synthesized speech has poor quality relatively. Formant coding and linear predictive coding are the typical approaches of parametric coding, of which the ‘linear prediction’ is a commonly used technology in speech processing, which has been successfully used in the applications of speech recognition [4], speech coding [5], etc.

Deep researches show that the speech signals are time series, which are time-varying and contain lots of nonlinear characteristics [6], and the linear prediction cannot meet the demand of modern speech processing. With the development of nonlinear theories, a few approaches, like neural network, have been widely used in speech processing, and the nonlinear research has become a hotpot in the domain of speech processing [7]. In this paper, the nonlinear models of speech signals are constructed based on the genetic programming.

Genetic programming (GP) [8] is a special optimization algorithm developed from genetic algorithm (GA). The hierarchical structure is used in GP and the solutions of different problems are boiled down to the corresponding computer programs with some given constraints. GP can accomplish the collateral optimization of the structure and the parameters of the model, which makes it extensively used in the modeling of nonlinear systems [9], data analysis [10], etc.

In this paper, an improved genetic programming is proposed to construct the nonlinear speech models based on the nonlinear characteristics of speech signals. By analyzing these models, a normalized model that has generalization ability is obtained. And finally, the speech coding is accomplished by optimizing the parameters of the normalized model using an optimization algorithm. The second part introduces some related works; the third part gives a general description of the proposed speech processing method; in part 4 and 5, the improved GP is proposed and the implementation of the speech coding is described particularly; Experiments is done in part 6 to demonstrate the method proposed in this paper.

Section snippets

Linear predictive coding

Linear predictive coding (LPC) is based on the assumption of all-pole model of the speech signals, whose parameters are estimated under the principle of the least-square error in the time domain. LPC can preferably describe the spectrum of the speech and the characteristics of the vocal tract, and also can reduce the kbps of speech coding. The structure of LPC model is as follows,s(n)=i=1pais(ni)+Gu(n)where G is the gain; u(n) is the excitation, which is the unit pulse sequence when s(n) is

Proposed speech coding method

Discrete speech signals are nonlinear time series, and the samples are correlated with their neighbors. The traditional LPC model also indicates this phenomenon. Actually, the analysis of the speech signals shows that the largest correlation value exists between the adjacent samples. When the sample has a sampling rate of 8 kHz, the correlation value of the adjacent samples is larger than 0.85. Even there are 10 samples apart from one to another; the correlation value between them also has a

Improved genetic programming

In the evolutionary process, GP can accomplish the collateral optimization of the structure and the parameters of the model. But the optimization of the structure is more focused on, and the optimization ability of the parameters is limited. In the improved GP proposed in this paper, the thought of hill-climbing algorithm is introduced to improve the optimization ability of the parameters. Moreover, considering the particularity of speech signals, the structure of individuals and the fitness

The implementation of speech coding

The speech coding develops to compress the speech signals in the transmission process. In the traditional LPC, the model structure is fixed, and the coding process is implemented only by optimizing the corresponding parameters of different frames.

In this paper, after the pre-processing of the speech signals, GP is utilized to construct the model of each frame. Then by the analysis of the models, a normalized model that has generalization ability is obtained. And finally, the speech coding is

Experiments

The experiments of this research are accomplished based on different samples. The improved GP is used to construct the nonlinear models of the samples, and by the analysis of these models’ structures, a normalized model that has generalization ability is obtained. Then the DUPSO algorithm is used to get the corresponding optimal parameters of different frames to accomplish the process of speech coding.

The speech signals in the experiments are chosen from the corpus. During the preprocessing of

Conclusion and future work

Speech signals are nonlinear time series. In this paper, the GP is introduced to construct the nonlinear models of the speech signals, and is improved based on the characteristics of the speeches. The hill-climbing algorithm is used to optimize the parameters locally. By the analysis of the models gotten by the improved GP, a normalized nonlinear model which has generalization ability is obtained. And an improved PSO algorithm is utilized to optimize the parameters of the normalized model

Acknowledgments

This work reported in this paper was supported by the NSF of China (Grant no. 11172342), NSF of Shaanxi Province, China (Grant no. 2012JM8043) and Program for New Century Excellent Talents in University of Ministry of Education, China (Grant no. NCET-11-0674). The authors thank the referees for their valuable suggestions and comments.

References (23)

  • S. Scanzio et al.

    Parallel implementation of artificial neural network training for speech recognition

    Pattern Recognition Letters

    (2010)
  • Y.-S. Lee et al.

    Forecasting time series using a methodology based on autoregressive integrated moving average and genetic programming

    Knowledge-Based System

    (2011)
  • P. Zoran et al.

    An adaptive waveform coding algorithm and its application in speech coding

    Digital Signal Processing

    (2012)
  • D.I. Hyun et al.

    Improved phase parameter analysis and synthesis for parametric stereo audio coding

  • G. Wan et al.

    Real-time speech coding and decoding for GSM system and its implements in VC

  • K. Kinoshita et al.

    Suppression of late reverberation effect on speech signal using long-term multi-step linear prediction

    IEEE Transactions on Audio, Speech and Language Processing

    (2009)
  • Y. Cao et al.

    Research on order-variable code exited linear prediction speech coding method

    International Symposium on Computer Network and Multimedia Technology

    (2009)
  • I. Kokkinos et al.

    Nonlinear speech analysis using models for chaotic systems

    IEEE Transaction on Speech and Audio Processing

    (2005)
  • A. Max

    Little mathematical foundations of nonlinear non-gaussian, and time-varying digital speech signal processing

  • J.R. Koza

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1994)
  • K. Yan Chan et al.

    Modeling customer satisfaction for product development using genetic programming

    Journal of Engineering Design

    (2011)
  • Cited by (0)

    View full text