Garuda - Garba Rujukan Digital

Indonesian Journal of Innovation Studies

Vol. 27 No. 1 (2026): January

Khairunnisa, Raissa Araminta (Unknown)
Pujianto, Utomo (Unknown)

Publish Date
04 Jan 2026

General Background: The integration of Generative AI in educational assessment enables rapid construction of large-scale question banks, particularly in programming education, yet raises concerns regarding content validity. Specific Background: In algorithm and programming domains, Generative AI models frequently assign Higher Order Thinking Skills and Lower Order Thinking Skills labels automatically, creating potential discrepancies with Bloom’s Taxonomy classifications. Knowledge Gap: Empirical evidence validating the reliability of AI-generated cognitive labels and comparing statistical and transformer-based classification methods on small, domain-specific Indonesian datasets remains limited. Aims: This study aims to audit the reliability of cognitive labels generated by the Gemini model through expert validation and to compare TF-IDF–SVM and IndoBERT–SVM classifiers under class-imbalanced conditions. Results: Expert validation revealed substantial mislabeling, with a claimed balanced dataset becoming skewed toward LOTS. Classification experiments using five-fold cross-validation showed that TF-IDF–SVM achieved a slightly higher macro F1-score than IndoBERT–SVM. Novelty: The study demonstrates that simple lexical representations with stemming can outperform transformer-based embeddings when data are limited and domain-specific. Implications: These findings emphasize the necessity of human validation in AI-generated assessments and support the use of lightweight statistical text classification for automated cognitive level evaluation in constrained educational contexts. Highlights • Generative AI cognitive labels showed substantial inconsistency after expert validation• Lexical feature representation yielded higher macro-level classification balance• Human-in-the-loop validation remained essential for programming assessment datasets Keywords HOTS; LOTS; Generative AI; Text Classification; TF-IDF

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Indonesian Journal of Innovation Studies

Website

Abbrev

ijins

Publisher

Universitas Muhammadiyah Sidoarjo

Subject

Computer Science & IT Education Engineering Law, Crime, Criminology & Criminal Justice

Description

Indonesian Journal of Innovation Studies (IJINS) is a peer-reviewed journal published by Universitas Muhammadiyah Sidoarjo four times a year. This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global ...

Article Info

Abstract

Generative Artificial Intelligence Label Reliability in Programming Assessment: Reliabilitas Label Kecerdasan Buatan Generatif pada Asesmen Algoritma Pemrograman

Article Info

Abstract