Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : JOIV : International Journal on Informatics Visualization

Performance Analysis of Feature Mel Frequency Cepstral Coefficient and Short Time Fourier Transform Input for Lie Detection using Convolutional Neural Network Kusumawati, Dewi; Ilham, Amil Ahmad; Achmad, Andani; Nurtanio, Ingrid
JOIV : International Journal on Informatics Visualization Vol 8, No 1 (2024)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62527/joiv.8.1.2062

Abstract

This study aims to determine which model is more effective in detecting lies between models with Mel Frequency Cepstral Coefficient (MFCC) and Short Time Fourier Transform (STFT) processes using Convolutional Neural Network (CNN). MFCC and STFT processes are based on digital voice data from video recordings that have been given lie or truth information regarding certain situations. Data is then pre-processed and trained on CNN. The results of model performance evaluation with hyper-tuning parameters and random search implementation show that using MFCC as Voice data processing provides better performance with higher accuracy than using the STFT process. The best parameters from MFCC are obtained with filter convolutional=64, kerneconvolutional1=5, filterconvolutional2=112, kernel convolutional2=3, filter convolutional3=32, kernelconvolutional3 =5, dense1=96, optimizer=RMSProp, learning rate=0.001 which achieves an accuracy of  97.13%, with an AUC value of 0.97. Using the STFT, the best parameters are obtained with filter convolutional1=96, kernel convolutional1=5, convolutional2 filters=48, convolutional2 kernels=5, convolutional3 filters=96, convolutional3 kernels=5, dense1=128, Optimizer=Adaddelta, learning rate=0.001, which achieves an accuracy of 95.39% with an AUC value of 0.95. Prosodics are used to compare the performance of MFCC and STFT. The result is that prosodic has a low accuracy of 68%. The analysis shows that using MFCC as the process of sound extraction with the CNN model produces the best performance for cases of lie detection using audio. It can be optimized for further research by combining CNN architectural models such as ResNet, AlexNet, and other architectures to obtain new models and improve lie detection accuracy.