Abstract
Speech quality estimation, as perceived by humans, is of vital importance to proper functioning of telecommunications networks. Speech quality can be degraded due to various network related problems. In this paper we present a model for speech quality estimation that is a function of various time and frequency domain features of human speech. We have employed a hybrid optimization approach, by using Genetic Programming (GP) to find a suitable structure for the desired model. In order to optimize the coefficients of the model we have employed a traditional GA and a numerical method known as linear scaling. The proposed model outperforms the ITU-T Recommendation P.563 in terms of prediction accuracy, which is the current non-intrusive speech quality estimation model. The proposed model also has a significantly reduced dimensionality. This may reduce the computational requirements of the model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ITU-T.: Methods for subjective determination of transmission quality. International Telecommunications Union, Geneva, Switzerland, ITU-T Recommendation P.800 (1996)
ITU-T.: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. International Telecommunications Union, Geneva, Switzerland, ITU-T Recommendation P.862 (2001)
ITU-T.: Single-ended method for objective speech quality assessment in narrow-band telephony applications. International Telecommunications Union, Geneva, Switzerland, ITU-T Recommendation P.563 (2005)
ITU-T.: The E-Model, a computational model for use in transmission planning. International Telecommunications Union, Geneva, Switzerland, ITU-T Recommendation G.107 (2005)
Raja, A., Azad, R.M.A., Flanagan, C., Ryan, C.: Real-time, non-intrusive evaluation of VoIP. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 217–228. Springer, Heidelberg (2007)
Gray, P., Hollier, M.P., Massara, R.E.: Non-intrusive speech-quality assessment using vocal-tract models. In: IEE Proceedings of Vision, Image and Signal Processing, vol. 147 (December 2000)
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. Journal of Acoustical Society of America 87(4), 1738–1752 (1990)
ITU-T.: Subjective performance assessment of telephone-band and wideband digital codecs. International Telecommunications Union, Geneva, Switzerland, ITU-T Recommendation P.830 (1996)
Gold, B., Morgan, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley, New York (1999)
Jin, C., Kubichek, R.: Vector quantization techniques for output-based objective speechquality. In: IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 1, pp. 1291–1294 (November 1996)
Picovici, D., Mahdi, A.E.: New output-based perceptual measure for predicting subjective quality of speech. In: IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), vol. 5, pp. 17–21 (May 2004)
Tarraf, A., Meyers, M.: Neural network-based voice quality measurement technique. In: IEEE international symposium on Computers and Communications, pp. 375–381 (July 1999)
Falk, T.H., Chan, W.-Y.: Single-ended speech quality measurement using machine learning methods. IEEE Transactions on Audio, Speech and Language Processing 14(6), 1935–1947 (2006)
Grancharov, V., Zhao, D.Y., Lindblom, J., Kleijn, W.B.: Low-complexity, nonintrusive speech quality assessment. IEEE Transactions on Audio, Speech and Language Processing 14(6), 1948–1956 (2006)
Li, W., Kubichek, R.: Output-based objective speech quality measurement using continuous Hidden Markov Models. In: Seventh International Symposium on Signal Processing and Its Applications, vol. 1, pp. 1–4 (July 2003)
Keijzer, M.: Scaled symbolic regression. Genetic Programming and Evolvable Machines 5(3), 259–269 (2004)
ITU-T.: coded-speech database. International Telecommunications Union, Geneva, Switzerland, ITU-T P.Supplement 23 (1998)
Thorpe, L., Yang, W.: Performance of current perceptual objective speech quality measures. In: IEEE International Speech Coding, vol. 1, pp. 144–146 (May 1996)
ITU-T.: Pulse Code Modulation (PCM) of voice frequencies. International Telecommunications Union, Geneva, Switzerland, ITU-T Recommendation G.711 (November 1988)
ITU-T.: 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM). International Telecommunications Union, Geneva, Switzerland, ITU-T Recommendation G.726 (1990)
Luke, S., Panait, L.: Lexicographic parsimony pressure. In: W.B.L. (ed.) GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, New York, pp. 829–836 (2002)
Gustafson, S., Burke, E.K., Krasnogor, N.: On improving genetic programming for symbolic regression. In: D.C., et al. (eds.) Proceedings of the 2005 IEEE Congress on Evolutionary Computation, Edinburgh, UK, 2-5 September, vol. 1, pp. 912–919. IEEE Press, Los Alamitos (2005)
Howard, L.M., D’Angelo, D.J.: The GA–P: A genetic algorithm and genetic programming hybrid. IEEE Expert 10(3), 11–15 (1995)
Topchy, A., Punch, W.F.: Faster genetic programming based on local gradient search of numeric leaf values. In: Spector, L., Goodman, E.D., Wu, A., Langdon, W.B., Voigt, H.-M., Gen, M., Sen, S., Dorigo, M., Pezeshk, S., Garzon, M.H., Burke, E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), San Francisco, California, USA, 7-11 july, pp. 155–162. Morgan Kaufmann, San Francisco (2001), http://www.cs.bham.ac.uk/~wbl/biblio/gecco2001/d01.pdf
Mugambi, E.M., Hunter, A., Oatley, G., Kennedy, L.: Polynomial-fuzzy decision tree structures for classifying medical data. Knowledge-Based Systems 17(2-4), 81–87 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Raja, A., Flanagan, C. (2008). Real-Time, Non-intrusive Speech Quality Estimation: A Signal-Based Model. In: O’Neill, M., et al. Genetic Programming. EuroGP 2008. Lecture Notes in Computer Science, vol 4971. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78671-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-78671-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78670-2
Online ISBN: 978-3-540-78671-9
eBook Packages: Computer ScienceComputer Science (R0)