Claim Missing Document
Check
Articles

Found 2 Documents
Search

Signal Enhancement by Single Channel Source Separation Bagus Tris Atmaja; Dhany Arifianto
IPTEK Journal of Proceedings Series No 1 (2015): 1st International Seminar on Science and Technology (ISST) 2015
Publisher : Institut Teknologi Sepuluh Nopember

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j23546026.y2015i1.1081

Abstract

Most gadgets and electronics devices are commonly equipped with single microphone only. This is difficult task in source separation world which traditionally required more sensors than sources to achieve better performance. In this paper we evaluated single channel source separation to enhance target signal from interred noise. The method we used is non-negative matrix factorization (NMF) that decompose signal into its components and find the matched signal to target speaker. As objective evaluation, coherence score is used to measure the perceptual similarity from enhanced to original one. It show the extracted has 0.5 of average coherence that shows medium correlation between both signals.
Dimensional Speech Emotion Recognition from Acoustic and Text Features using Recurrent Neural Networks Bagus Tris Atmaja; Masato Akagi; Reda Elbarougy
International Journal of Informatics, Information System and Computer Engineering (INJIISCOM) Vol 1 No 1 (2020): International JournalĀ of Informatics, Information System and Computer Engineering
Publisher : Universitas Komputer Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (566.864 KB) | DOI: 10.34010/injiiscom.v1i1.4023

Abstract

Emotion can be inferred from tonal and verbal information, where both features can be extracted from speech. While most researchers conducted studies on categorical emotion recognition from a single modality, this research presents a dimensional emotion recognition combining acoustic and text features. A number of 31 acoustic features are extracted from speech, while word vector is used as text features. The initial result on single modality emotion recognition can be used as a cue to combine both features with improving the recognition result. The latter result shows that a combination of acoustic and text features decreases the error of dimensional emotion score prediction by about 5% from the acoustic system and 1% from the text system. This smallest error is achieved by combining the text system with Long Short-Term Memory (LSTM) networks and acoustic systems with bidirectional LSTM networks and concatenated both systems with dense networks.