Text-to-SQL is an approach that enables users to interact with data-bases using natural language, eliminating the need to understand SQL syntax. However, most existing approaches translate input sentences directly into final SQL queries without explicitly identifying the type of SQL operation involved. This may obscure the distinction between structural and manipulative commands and increase the risk of execut-ing unintended or destructive queries. This study proposes separating the identification of SQL operation types—specifically Data Definition Language (DDL) commands—as a standalone classification task using the Support Vector Machine (SVM) algorithm. Indonesian-language sentences are preprocessed through tokenization, stopword removal, and stemming, then transformed into feature vectors using TF-IDF with unigram and bigram representations. Experiments were conducted on a dataset of 800 Indonesian sentences covering four DDL operations: CREATE, ALTER, DROP, and TRUNCATE. The results show that the proposed SVM model achieved an average accuracy of 93.05%, out-performing baseline models such as Naive Bayes and Random Forest. These findings indicate that early identification of SQL operation types can enhance the accuracy, efficiency, and safety of Text-to-SQL sys-tems. This work also highlights the importance of developing NLP ap-proaches tailored for the Indonesian language in the context of data-base querying.
Copyrights © 2024