This study is motivated by the increasing need to process audio data efficiently, such as in meetings, lectures, and interviews, which are usually still done manually. This manual process is time-consuming and prone to human error, so an automated system is needed that can convert speech into text and summarize information accurately. The main objective of this study is to develop an automated system that integrates the Hidden Markov Model (HMM) for speech transcription and TextRank for text summarization, and to evaluate the performance of the system. This study uses a quantitative experimental approach with research subjects in the form of audio data in MP3 format obtained from various activities, such as meetings, lectures, and interviews. The audio data is processed using the feature extraction method using Mel-Frequency Cepstral Coefficients (MFCC), then transcribed using HMM and summarized using the TextRank algorithm. Data analysis is carried out by measuring the accuracy of the transcription using the Word Error Rate (WER) and evaluating the quality of the summary using the ROUGE metric. This system is tested on three audio categories with varying complexity. The results show that the system achieves high transcription accuracy, especially for interview audio (WER: 7.6%) and effective summarization performance (ROUGE-1: 0.78, ROUGE-L: 0.74). Furthermore, the automated workflow shows up to 96% time efficiency improvement compared to the manual method. These findings demonstrate the practical feasibility of combining probabilistic and graph-based algorithms to automate large-scale audio data processing. This approach significantly reduces human workload while ensuring accuracy and consistency. This research has implications for contributing to the advancement of hybrid natural language processing systems and providing a solid foundation for future integration with transformer-based abstractive summarization and multilingual scalability.