Garuda - Garba Rujukan Digital

Seminar Nasional Aplikasi Teknologi Informasi (SNATI)

2005

Suyanto Suyanto (Unknown)

Publish Date
02 Oct 2009

This paper describes a system to classify Indonesian speech into voiced-unvoiced-silence (VUS). In thissystem, a speech of 16 KHz is segmented into frames of 10 milliseconds with overlap of 20%. Next, each frame ischaracterized using 3 features in time domain: frame energy (E), level crossing rate (LCR) and differential levelcrossing rate (DLCR). Furthermore, each frame is classified using an Evolving Feedforward Neural Network(EFNNs), which is Feedforward Neural Network (FNNs) that be trained using evolutionary algorithms (EAs).Finally, the classified frames are concatenated to get a right VUS classification. The training data iscombination of 18 consonants and 7 vowels from a single speaker. Whereas validation set and testing data isdeveloped from 25 word speeches represent all the combination of consonants and vowels. Computer simulationshows that the best FNNs architecture is 3-10-3 (3 inputs, 10 hidden unit, and 3 output units) and theappropriate number of training data is 150. It gives a total accuracy of 0.7366, where the accuracies for voiced,unvoiced, and silence respectively are 0.6206, 0.6428, and 0.9626. Since the accuracies for voiced and unvoicedare very low, then the whole VUS system is poor, even a filtering procedure has been applied.Keywords: indonesian speech, voiced-unvoiced-silence classification, evolving feedforward neural network

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref