Jurnal Sains dan Teknologi
Vol. 14 No. 1 (2025): April

A Machine Learning Framework for Automatic Speech Transcription and Summarization Using HMM and TextRank

Kurnia , Yusuf (Unknown)
Kristen (Unknown)
Rossi , Ardiane (Unknown)
Junaedi (Unknown)
Hermawan , Aditiya (Unknown)



Article Info

Publish Date
25 Apr 2025

Abstract

This study is motivated by the increasing need to process audio data efficiently, such as in meetings, lectures, and interviews, which are usually still done manually. This manual process is time-consuming and prone to human error, so an automated system is needed that can convert speech into text and summarize information accurately. The main objective of this study is to develop an automated system that integrates the Hidden Markov Model (HMM) for speech transcription and TextRank for text summarization, and to evaluate the performance of the system. This study uses a quantitative experimental approach with research subjects in the form of audio data in MP3 format obtained from various activities, such as meetings, lectures, and interviews. The audio data is processed using the feature extraction method using Mel-Frequency Cepstral Coefficients (MFCC), then transcribed using HMM and summarized using the TextRank algorithm. Data analysis is carried out by measuring the accuracy of the transcription using the Word Error Rate (WER) and evaluating the quality of the summary using the ROUGE metric. This system is tested on three audio categories with varying complexity. The results show that the system achieves high transcription accuracy, especially for interview audio (WER: 7.6%) and effective summarization performance (ROUGE-1: 0.78, ROUGE-L: 0.74). Furthermore, the automated workflow shows up to 96% time efficiency improvement compared to the manual method. These findings demonstrate the practical feasibility of combining probabilistic and graph-based algorithms to automate large-scale audio data processing. This approach significantly reduces human workload while ensuring accuracy and consistency. This research has implications for contributing to the advancement of hybrid natural language processing systems and providing a solid foundation for future integration with transformer-based abstractive summarization and multilingual scalability.

Copyrights © 2025






Journal Info

Abbrev

JST

Publisher

Subject

Computer Science & IT Education

Description

Jurnal Sains dan Teknologi(JST) is a journal aims to be a peer-reviewed platform and an authoritative source of information. We publish original research papers, review articles and case studies focused on Mathematic, Biology, Physic, Chemistry, Informatic, Electronic and Machine as well as related ...