Learning basic nahwu in studying yellow books still faces various challenges, especially in terms of interpreting books objectively and interactively. This study aims to build an automatic book interpretation system using forced alignment based on Wav2Vec2 + CTC Segmentation for audio and text alignment. This system is designed to provide automatic interpretation with audio and text alignment to facilitate the preparation of students in interpreting books and learning nahwu, especially jurumiyah books. The implementation process involves the extraction and pre-processing of audio and text data, audio and text are then aligned using Wav2Vec2 to produce logits output containing the number of samples, frames, and character tokens, then logits are received by CTC to calculate the alignment, manage blank tokens, calculate sequence probabilities and decoding to text to produce a timestamp array. Then the timestamp is validated and normalized and the final result is TextGrid or JSON. Then the results are integrated in an interactive website interface. The results of this study indicate that the forced alignment algorithm using the Wav2Vec2 model is capable of aligning audio and text with a fairly high level of accuracy. This makes it easier for users to understand the contents of the book through segmented audio playback per sentence or chapter. It is hoped that this research can contribute to the development of learning media for Islamic boarding schools' yellow books based on alignment technology.
Copyrights © 2026