Claim Missing Document
Check
Articles

Found 4 Documents
Search
Journal : Emerging Science Journal

SlowFast-TCN: A Deep Learning Approach for Visual Speech Recognition Ha, Nicole Yah Yie; Ong, Lee-Yeng; Leow, Meng-Chew
Emerging Science Journal Vol 8, No 6 (2024): December
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2024-08-06-024

Abstract

Visual Speech Recognition (VSR), commonly referred to as automated lip-reading, is an emerging technology that interprets speech by visually analyzing lip movements. A challenge in VSR where visually distinct words produce similar lip movements is known as homopheme problem. Visemes are the basic visual units of speech that are produced by the lip movements and positions. Furthermore, visemes are typically having shorter durations than words. Consequently, there is less temporal information for distinguishing between different viseme classes, leading to increased visual ambiguity during classification. To address this challenge, viseme classification must not only extract lip image spatial features, but also to handle visemes of varying durations and temporal features. Therefore, this study proposed a new deep learning approach SlowFast-TCN. SlowFast network is used as the frontend architecture to extract the spatio-temporal features of the slow and fast pathways. Temporal Convolutional Network (TCN) is used as the backend architecture to learn the features from the frontend to perform the classification. A comparative ablation analysis to dissect each component of the proposed SlowFast-TCN is performed to evaluate the impact of each component. This study utilizes a benchmark dataset, Lip Reading in Wild (LRW), that focuses on English language. Two subsets of the LRW dataset, comprising of homopheme words and unique words, represent the homophemic and non-homophemic dataset, respectively. The proposed approach is evaluated on varying lighting conditions to assess its performance in real-world scenarios. It was found that illumination can significantly affect the visual data. Key performance metrics, such as accuracy and loss are used to evaluate the effectiveness of the proposed approach. The proposed approach outperforms traditional baseline models in accuracy while maintaining competitive execution time. Its dual-pathway architecture effectively captures both long-term dependencies and short-term motions, leading to better performance in both homophemic and non-homophemic datasets. However, it is less robust when dealing with non-ideal lighting scenarios, indicating the need for further enhancements to handle diverse lighting scenarios. Doi: 10.28991/ESJ-2024-08-06-024 Full Text: PDF
From Teaching to Employability: The Cultural and Performance Pathways to Success Almaqbali, Said; Meng-Chew , Leow; Shannaq , Boumedyen; Marhoubi, Asmaa H.; Ong, Lee-Yeng
Emerging Science Journal Vol. 9 No. 5 (2025): October
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2025-09-05-027

Abstract

The current research examines the possible mediating and moderating effects of Teaching Efficacy (TE) and National Culture (NC) on the nexus of Readiness of Students (RS), Interactive Online Collaboration (IOC), Faculty Training (FT), and Policy Support (PS) and the ensuing results of Student Performance (SP), Job Employment (JE), Student Competency (SC), and University Reputation (UR). We have evaluated both the direct and indirect association between the stipulated constructs by utilizing Partial Least Squares Structural Equation Modeling (PLS -SEM) on a sample of 291 respondents who were sampled using structured questionnaires. The empirical evidence suggests that TE is a medium of connecting between RS, PS, and SP and therefore enhances its impact on JE, SC, and UR. Notably, the influence of SP on JE is statistically significant in case of concurrent TE activity (O for indirect path = 0.215, p<0.001). Similarly, mediation helped students score better on SC (O = 0.327, t = 6.261, p < 0.001) and UR (O = -0.065, t = 1.911, p = 0.028). A substantial direct correlation was found between RS and TE (r = 0.282, t = 4.175, p < 0.001). The outcome of the moderate analysis indicated that Organizational Culture exerted a strong influence, leading to a positive impact on the correlation between TE and SP (O = 0.087, t = 1.994, p = 0.023). In addition, Information Culture (IC) acted as a protective factor, moderating the relationship between RS and TE (O = -0.093, t = 1.945, p = 0.026). Taking TE as the main factor and cultural elements as moderators significantly improved the model's performance, demonstrating that student results and university reputation can be enhanced when there is strong teaching competence and a positive organizational environment within these institutions.
Enhance Multimodal Retrieval-Augmented Generation Using Multimodal Knowledge Graph How, Shue-Kei; Ong, Lee-Yeng; Leow, Meng-Chew
Emerging Science Journal Vol. 9 No. 6 (2025): December
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2025-09-06-025

Abstract

Large Language Models (LLMs) have shown impressive capabilities in natural language understanding and generation tasks. However, their reliance on text-only input limits their ability to handle tasks that require multimodal reasoning. To overcome this, Multimodal Large Language Models (MLLMs) have been introduced, enabling inputs such as images, text, video and audio. While MLLMs address some limitations, they often suffer from hallucinations because of over-reliance on internal knowledge and face high computational costs. Traditional vector-based multimodal RAG systems attempt to mitigate these issues by retrieving supporting information, but often suffer from cross-modal misalignment, where independently retrieved text and image content cannot align meaningfully. Motivated by the structured retrieval capabilities of text-based knowledge graph RAG, this paper proposes VisGraphRAG to address the challenge by modelling structured relationships between images and text within a unified MMKG. This structure enables more accurate retrieval and better alignment across modalities, resulting in more relevant and complete responses. The experimental results show that VisGraphRAG significantly outperforms the vector database-based baseline RAG, achieving a higher answer accuracy of 0.7629 compared to 0.6743. Besides accuracy, VisGraphRAG also shows superior performance in key RAGAS metrics such as multimodal relevance (0.8802 vs 0.7912), showing its stronger ability to retrieve relevance information across modalities. These results underscore the effectiveness of the proposed Multimodal Knowledge Graph (MMKG) methods in enhancing cross-modal alignment and supporting more accurate, context-aware generation in complex multimodal tasks.
Structure-Aware Chunking for Complex Tables in Retrieval-Augmented Generation Systems Koay, Xin-Kuang; Ong, Lee-Yeng; Goh, Pey-Yun
Emerging Science Journal Vol. 10 No. 1 (2026): February
Publisher : Ital Publication

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28991/ESJ-2026-010-01-09

Abstract

Retrieval-Augmented Generation (RAG) is a hybrid method that combines information retrieval with large language models to generate context-aware, factually grounded responses. However, the RAG system relies heavily on well-structured input data to generate accurate and contextually relevant responses. Documents with complex table layouts pose significant challenges, as most chunking strategies are text-centric and often overlook table-rich documents containing multi-column and multi-row structures. Hence, this study proposes a customized structure-aware chunking framework specifically designed for university course documents containing multi-column, multi-row tables with nested headers. The framework employs Camelot for high-fidelity table extraction, followed by customized logic that constructs semantically coherent chunks by preserving academic term, subject name, credit hour, and category. This prevents semantic fragmentation during retrieval. The proposed method is evaluated using the RAGAS framework and compared against several baselines using standard parsing and chunking techniques. Results show that the proposed approach achieves the highest answer accuracy of 0.73 and substantially improves retrieval relevance and contextual precision. These findings demonstrate the framework’s effectiveness in handling structure-dependent academic queries. This study highlights that ensuring both parsing quality and chunking strategy is essential to retain semantic relationships in table-rich documents, offering a practical improvement for RAG systems in structurally complex scenarios.