Bhuiyan, Abul Bashar
BRAC University, Dhaka, Bangladesh

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Bengali Word Detection from Lip Movements Using Mask RCNN and Generalized Linear Model Bhuiyan, Abul Bashar; Uddin, Jia
Indonesian Journal of Electrical Engineering and Informatics (IJEEI) Vol 12, No 2: June 2024
Publisher : IAES Indonesian Section

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52549/ijeei.v12i2.5088

Abstract

Speech processing with the help of lip detection and lip reading is an advancing field. For this, we need proper algorithms and techniques to detect lips and movements of lips perfectly. Lip detection and configuration are the most important parts of speech recognition. In this paper, we focus on detecting the lip segment properly. Mask R-CNN (Regional Convolutional Neural Network) performs object detection and instance segmentation per video frame to detect the lip segment. The process of mask R-CNN adds only a small overhead to Faster R-CNN and is quite simple to train, running at 5 frames per second. The Mask R-CNN involves keypoint detection which helps to extract the location of the lip landmarks pixel by pixel. Once the lip region is extracted and the landmarks are highlighted, we observe how the lip landmarks change as the object's lips move over time to each Bengali word. The keypoint changes that are observed during each millisecond are then the landmarks used to train the GLM (Generalized Linear Model). In addition, we compare the performance of GLM with Naive Bayes, Logistic Regression, and Decision Tree. The GLM has exhibited the highest 91.8% accuracy, whereas the Naive Bayes, Logistic Regression, and Decision Tree show the accuracy of 87.1%, 38.3%, and 82.2%, respectively.