Subali, Made Agus Putra
Unknown Affiliation

Published : 4 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 4 Documents
Search

Clustering Balinese Language Documents using the Balinese Stemmer Method and Mini Batch K-Means with K-Means++ Subali, Made Agus Putra; Sugiartha, I Gusti Rai Agung; Budiarta, Komang; Adnyana, I Made Budi
Journal of Applied Informatics and Computing Vol. 7 No. 2 (2023): December 2023
Publisher : Politeknik Negeri Batam

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30871/jaic.v7i2.6476

Abstract

Clustering aims to categorize data into n groups, where data within each group exhibits maximum similarity, while the similarity between groups is minimized. Among various clustering methods, k-means is widely employed due to its simplicity and ability to yield optimal clustering results. However, the k-means method is susceptible to slow processing in high-dimensional datasets and the clustering outcomes are sensitive to the initial selection of cluster center values. In addressing these limitations, this study employs the k-means mini-batch method to enhance processing speed for high-dimensional data and utilizes the k-means++ method to optimize the selection of initial cluster center values. The dataset for this research comprises 300 news articles in Balinese sourced from the https://balitv.tv/ website. Prior to the clustering process, a stemming procedure is applied using the Balinese stemmer method to enhance recall. The obtained results reveal that a majority of the 300 data instances exhibit a high degree of similarity, as indicated by the clustering results. If the number of clusters (n) exceeds two, the data fails to be distinctly separated due to the high structural similarity among the data instances. This can be attributed to the relatively small number of words or attributes produced. In future research, feature reduction will be implemented, and a clustering method capable of addressing data overlap will be explored.
Software Defects Predictions using SQL Complexity and Naïve Bayes Subali, Made Agus Putra; Sugiartha, I Gusti Rai Agung; Adnyana, I Made Budi; Putra, I Putu Aditya; Subawa, Made Dai
Compiler Vol 14, No 1 (2025): May
Publisher : Institut Teknologi Dirgantara Adisutjipto

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.28989/compiler.v14i1.2979

Abstract

Software defects result in unreliable software, therefore predicting software defects is an effort to produce quality software. In this study, we used the naïve bayes method because it has the appropriate characteristics of the data used. The data used include NASA MDP datasets and datasets from the calculation of the sql complexity method on eight software modules. The use of two datasets was carried out because in the NASA MDP datasets there were no attributes that paid attention to the use of sql commands, therefore in the datasets from the eight software modules the sql complexity attribute was included which paid attention to the level of complexity of the use of sql commands in each module. The prediction results of this study were evaluated by considering the values of accuracy, precision, recall, and f-measure. Based on these results, the accuracy results of CM1 were 88%, PC2 was 97%, and KC3 was 78%.
Kombinasi Metode Rule-Based dan N-Gram Stemming untuk Mengenali Stemmer Bahasa Bali Subali, Made Agus Putra; Fatichah, Chastine
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 6 No 2: April 2019
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (135.168 KB) | DOI: 10.25126/jtiik.2019621105

Abstract

Proses untuk mengekstraksi kata dasar dari kata berafiks dikenal dengan istilah stemming yang bertujuan meningkatkan recall dengan mereduksi variasi kata berafiks ke dalam bentuk kata dasarnya. Penelitian terdahulu tentang stemming bahasa Bali pernah dilakukan menggunakan metode rule-based, tapi afiks yang diluluhkan hanya prefiks dan sufiks, sedangkan variasi afiks lain tidak diluluhkan, seperti infiks, konfiks, simulfiks, dan kombinasi afiks. Penelitian tentang stemming menggunakan pendekatan rule-based telah diterapkan di berbagai bahasa yang berbeda. Metode rule-based memiliki kelebihan jika diterapkan pada domain yang sederhana, maka rule-based mudah untuk diverifikasi dan divalidasi, tapi memiliki kelemahan saat diterapkan pada domain dengan level kompleksitas yang tinggi, apabila sistem tidak dapat mengenali rules, maka tidak ada hasil yang diperoleh. Untuk mengatasi kelemahan stemming menggunakan rule-based, kami menggunakan metode n-gram stemming, dimana kata berafiks dan kata dasar diubah ke bentuk n-gram, kemudian tingkat kemiripan antara n-gram kata berafiks dan n-gram kata dasar diukur menggunakan metode dice coefficient, apabila tingkat kemiripannya memenuhi nilai ambang batas yang ditentukan, maka kata dasar yang dibandingkan dengan kata berafiks ditampilkan. Pada penelitian ini, kami mengembangkan metode stemmer yang meluluhkan seluruh variasi afiks pada bahasa Bali dengan mengombinasikan pendekatan rule-based dan metode n-gram stemming. Berdasarkan pengujian yang telah dilakukan untuk kesepuluh query metode yang diusulkan memperoleh rerata akurasi stemming lebih baik 96,67% dari metode terdahulu 75%, sedangkan untuk kelima query metode n-gram stemming dapat mengenali beberapa kata berafiks diluar rules. Penelitian berikutnya, kami akan memperhatikan semantik setiap kata dan tahap validasi menggunakan aplikasi text mining.AbstractA process for extracting a stem word from the inflected word is known as stemming which aims to increase recall by reducing the variation of the inflected word into its stem word form. Previous research on stemming the Balinese language has been done using the rule-based method, but the affixes that are removed are only prefixes and suffixes, while other variations of affixes are not removed, such as infixes, confixes, simulfiks, and combinations of affixes. Research on stemming using the rule-based approach has been applied in a variety of different languages. The rule-based method has advantages when applied to a simple field, rule-based is easy to verify and validate, but has weaknesses when applied to domains with a high level of complexity, if the system cannot recognize rules, no results are obtained. To overcome the stemming weaknesses using rule-based, we use the n-gram stemming method, where the inflected word and stem word are converted to the n-gram form, then the level of similarity between the n-gram of the inflected word and the stem word is measured using the dice coefficient method, when the level of similarity meets the defined threshold value, then the stem word is displayed. In this study, we developed a stemmer method that removes all variations of affixes in the Balinese language by combining the rule-based approach and the n-gram stemming method. Based on the experiments for the ten queries the proposed method get 96,67% stemming accuracy than the previous method 75%, while for the five queries for the n-gram stemming method can recognize some inflected words outside the rules. The next study, we will pay attention to the semantics of each word and the validation stage using text mining application.
Development of SLOC, CC, SQL Complexity Methods to Measure the Level of Similarity Complexity of Software Modules Subali, Made Agus Putra; Sugiartha, I Gusti Rai Agung; Putra, I Putu Aditya
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol. 9 No. 4 (2023): December
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v9i4.27150

Abstract

Software metrics are often used to reflect vulnerabilities in program code to measure the complexity of each software module. Knowing the complexity of each software module is an important thing to do because the project manager can analyze defects that may occur, costs spent, work schedules, and the resources needed. In this research, we aim to apply the SLOC, CC, SQL Complexity method in measuring the level of similarity of complexity between software modules by paying attention to the level of similarity of the syntactic structure of program logic and SQL commands, by knowing the similarity between software modules the project manager can predict the effort required. Based on the results of the level of equality for the eight modules, an average of 90% was obtained. The high results are due to the third feature used having a high level of similarity. In further research, other features will be added and weighting will be given to each feature.