Journal of Education and ICT
Vol 8, No 2 (2024)

AN SVM-BASED APPROACH FOR DETECTING DATA DEFINITION LANGUAGE OPERATIONS IN INDONESIAN NATURAL LANGUAGE

Yayak Kartika Sari (Universitas Bhinneka PGRI)
Fahrur Rozi (Universitas Bhinneka PGRI)
Agung Prasetya (Universitas Bhinneka PGRI)



Article Info

Publish Date
01 Dec 2024

Abstract

Text-to-SQL is an approach that enables users to interact with data-bases using natural language, eliminating the need to understand SQL syntax. However, most existing approaches translate input sentences directly into final SQL queries without explicitly identifying the type of SQL operation involved. This may obscure the distinction between structural and manipulative commands and increase the risk of execut-ing unintended or destructive queries. This study proposes separating the identification of SQL operation types—specifically Data Definition Language (DDL) commands—as a standalone classification task using the Support Vector Machine (SVM) algorithm. Indonesian-language sentences are preprocessed through tokenization, stopword removal, and stemming, then transformed into feature vectors using TF-IDF with unigram and bigram representations. Experiments were conducted on a dataset of 800 Indonesian sentences covering four DDL operations: CREATE, ALTER, DROP, and TRUNCATE. The results show that the proposed SVM model achieved an average accuracy of 93.05%, out-performing baseline models such as Naive Bayes and Random Forest. These findings indicate that early identification of SQL operation types can enhance the accuracy, efficiency, and safety of Text-to-SQL sys-tems. This work also highlights the importance of developing NLP ap-proaches tailored for the Indonesian language in the context of data-base querying.

Copyrights © 2024






Journal Info

Abbrev

joeict

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

This journal encompasses original research articles, review articles, and short communications, including: Pendidikan Teknologi Informasi Information System Artificial Intelligence AI & Expert systems Database Systems Computing Languages & Algorithms Computer Networks & Communications Computer ...