Indonesian Journal of Electrical Engineering and Computer Science
Vol 26, No 1: April 2022

Performance analysis of different intonation models in Kannada speech synthesis

Sadashiva Veerappa Chakrasali (Ramaiah Institute of Technology)
Krishnappa Indira (Ramaiah Institute of Technology)
Sunitha Yariyur Narasimhaiah (SJB Institute of Technology)
Shadaksharaiah Chandraiah (Bapuji Institute of Engineering and Technology)



Article Info

Publish Date
01 Apr 2022

Abstract

Text to speech (TTS) is a system that generates artificial speech from text input. The prosodic models used improve the quality of the synthesized speech especially naturalness and intelligibility. The prosody involves intonation, intonation refers to the variations in the pitch frequency (F0) with respect to time in an utterance. This work mainly concentrates on building feedback neural network model to predict F0 contour in the utterances using Fujisaki intonation model parameters as the input features to the network since the Fujisaki intonation model is data driven and not a rule based one. In this work we have built 4-layer feedback neural network in the festival framework. Finally, the synthetically generated Kannada speech using the neural network model, is compared for its performance with the classification and regression tree (CART) model and Tilt model. Database of simple declarative Kannada sentences created by Carnegie Mellon University have been deployed in this work. From the study it is very clear that F0 contours can be accurately predicted using CART and neural network models, whereas naturalness and intelligibility is high in CART model rather than neural network model.

Copyrights © 2022