Journal of Informatics and Vocational Education
Vol. 9 No. 1 (2026): Journal of Informatics and Vocational Education - March

Automatic Indexing of Digital Books using RAKE and Word2Vec

Arhan Windu Rizki Putra Budianto (Politeknik Negeri Malang)
Ulla Delfana Rosiani (Politeknik Negeri Malang)
Vit Zuraida (Politeknik Negeri Malang)
Rizki Putri Ramadhani (Politeknik Negeri Malang)
Mohammad Alfarizi Abdullah (Politeknik Negeri Malang)



Article Info

Publish Date
26 Jan 2026

Abstract

Manual indexing of digital books is time-consuming and prone to inconsistency. To address this, this study developed an automatic indexing system using RAKE (Rapid Automatic Keyword Extraction) method and Word2Vec. The system accepts PDF files as input, performs text preprocessing, and extracts key phrases using RAKE. These phrases are subsequently filtered based on semantic relevance to the specified topic using an Indonesian-language Word2Vec model. Users can manually add phrases and select relevant ones to be included in the final index. The resulting index includes phrases, page numbers, and relevance scores, which are inserted as an additional page at the end of the PDF document. Evaluation was conducted by comparing the system-generated index with the author’s manual index using precision, recall, and cosine similarity metrics. The results indicate that although precision and recall were very low, a cosine similarity score of 0.69 suggests a semantic similarity between the system output and the author’s index.

Copyrights © 2026






Journal Info

Abbrev

joive

Publisher

Subject

Computer Science & IT Education Social Sciences

Description

The Journal of Informatics and Vocational Education (JOIVE) is committed to advancing the understanding of applied computer science education, with a particular focus on the integration of informatics in vocational training and the development of innovative teaching and learning methodologies. ...