Lingua Technica: Journal of Digital Literary Studies
Vol. 2 No. 1 (2026): Literature and computation: mapping, modeling, and mediation

Text mining and semantic modeling of literary corpora: a machine learning–based study of Indonesian fiction

Rinda Widya Ikomah (Universitas Mataram)
Zohaib Hassan Sain (Superior University)



Article Info

Publish Date
30 Jan 2026

Abstract

Background: The large-scale digitization of Indonesian literary works has produced extensive textual corpora that challenge conventional close-reading approaches and call for systematic, data-driven methods capable of capturing thematic, semantic, and affective patterns in fiction. Objective: This study aims to examine how text mining and semantic modeling can reveal lexical salience, intertextual relations, and narrative emotion in Indonesian fiction across different thematic orientations. Method: Using a quantitative corpus-based design, the study analyzes 36 Indonesian literary texts published between 1980 and 2022 through TF–IDF–based lexical analysis, document-level semantic embeddings with cosine similarity and clustering, and sentence-level sentiment analysis. Results: The findings show distinct lexical signatures that differentiate thematic clusters, coherent semantic groupings reflecting intertextual proximity, and sentiment trajectories dominated by neutral-to-negative polarity with strategically placed affective peaks across narrative progression. Implication: These results demonstrate that computational methods can empirically support literary analysis without displacing interpretive criticism. Novelty: The study integrates lexical, semantic, and affective modeling within a unified framework for Indonesian fiction, offering a scalable and replicable approach to digital literary studies.

Copyrights © 2026






Journal Info

Abbrev

lingtech

Publisher

Subject

Arts Humanities Computer Science & IT Languange, Linguistic, Communication & Media

Description

This journal covers a wide range of fields, including digital literature, e-poetry, and the relationship between language, literature, and technology in diverse ...