Claim Missing Document
Check
Articles

Found 22 Documents
Search

Mapping The Landscape of Speech Processing Research: : Trends, Insights, and Emerging Directions Mardiana, Ardi; Bastian, Ade; Rifki, Muhamamad; Tresna Irawan, Eka
Jurnal Informatika Universitas Pamulang Vol 10 No 1 (2025): JURNAL INFORMATIKA UNIVERSITAS PAMULANG
Publisher : Teknik Informatika Universitas Pamulang

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Speech processing has become a significant study domain within signal processing, artificial intelligence, and human-computer interaction. This work does a bibliometric analysis to ascertain research trends, notable problems, and prospective directions in voice processing. We assess significant research outputs, including publication growth, influential authors, renowned journals, and collaboration networks during the last two decades, using data sourced from credible scientific sources such as Scopus and Web of Science. The results underscore notable progress in automated voice recognition, speaker identification, and speech synthesis, while simultaneously confronting ongoing issues associated with multilingual datasets, noise resilience, and resource efficiency. Moreover, new technologies, such deep learning and neural architecture search, are recognized as catalysts for future developments. This bibliometric study seeks to provide scholars and practitioners with a thorough overview of the existing environment and strategic insights for the advancement of the voice processing domain.
The Evolution of Image Captioning Models: Trends, Techniques, and Future Challenges Abrar, Abrar Wahid; Bastian, Ade; Hafsari, Zacky; Mardiana, Ardi
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control Vol. 10, No. 4, November 2025 (Article in Progress)
Publisher : Universitas Muhammadiyah Malang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22219/kinetik.v10i4.2305

Abstract

This study provides a comprehensive systematic literature review (SLR) of the evolution of image captioning models from 2017 to 2025, with a particular emphasis on the impending problems, methodological enhancements, and significant architectural developments. The evaluation is guided by the increasing demand for precise and contextually aware image descriptions, and it adheres to the PRISMA methodology. It selects 36 relevant papers from reputable scientific databases. The results indicate a significant transition from traditional CNN-RNN models to Transformer-based architectures, which leads to enhanced semantic coherence and contextual comprehension. Current methodologies, such as prompt engineering and GAN-based augmentation, have further facilitated generalization and diversity, while multimodal fusion solutions, which incorporate attention mechanisms and knowledge integration, have improved caption quality. Additionally, significant areas of concern include data bias, equity in model assessment, and support for low-resource languages. The study underscores the fact that modern vision-language models, such as Flamingo, GIT, and LLaVA, offer robust domain generalization through cross-modal learning and joint embedding. Furthermore, the efficacy of computing in restricted environments is improved by the development of pretraining procedures and lightweight models. This study contributes by identifying future prospects, analyzing technical trade-offs, and delineating research trends, particularly in sectors such as healthcare, construction, and inclusive AI. According to the results, in order to optimize their efficacy in real-world applications, future picture captioning models must prioritize resource efficiency, impartiality, and multilingual capabilities.