Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer
Vol 6 No 1 (2022): Januari 2022

Pengelompokan Topik Skripsi Mahasiswa Fakultas Ilmu Komputer Universitas Brawijaya berdasarkan Judul pada Periode 2015-2019 menggunakan Metode Semi Supervised K-Means

Mochammad Ilman Asnada (Fakultas Ilmu Komputer, Universitas Brawijaya)
Bayu Rahayudi (Fakultas Ilmu Komputer, Universitas Brawijaya)
Achmad Ridok (Fakultas Ilmu Komputer, Universitas Brawijaya)



Article Info

Publish Date
08 Dec 2021

Abstract

The title of the thesis is a sentence that briefly conveys some of the contents of the thesis itself. Every year the research or final project is always increasing, from the many titles used as the thesis it is possible that the topics discussed are almost the same or even the same. Based on this, in this study grouping the title of the thesis which is implemented in a program. The results of title grouping are displayed annually (2015 to 2019) in the form of a bar chart and then the number of data groups based on a predetermined topic or category will be seen. Extracting a collection of thesis titles using the flow of text mining which will be used as a dataset. Then the datasets are grouped using the semi-supervised k-means method, the method is the development of k-means. After that, the collection of thesis titles is preprocessed with the text mining method in which there are several stages, namely tokenization, filtering, stemming, term weighting. The initial stage of the semi-supervised k-means method is to label several datasets to determine the initial centroid, after which the data grouping process is carried out. Based on the results of tests carried out using the amount of test data that varies each year. From the test results every year (2015 to 2019) the silhoutte value is different and the largest silhoutte is in 2016 using the amount of 30% test data with a silhoutte of 0.0274024334, while the Davies Bouldin Index (DBI) value is optimal for testing 30% of the data. test in 2015 was 0.345362812. The results of grouping with the same amount of training data on each label also have a better silhouette value than the number of training data on each label that is not the same.

Copyrights © 2022






Journal Info

Abbrev

j-ptiik

Publisher

Subject

Computer Science & IT Control & Systems Engineering Education Electrical & Electronics Engineering Engineering

Description

Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer (J-PTIIK) Universitas Brawijaya merupakan jurnal keilmuan dibidang komputer yang memuat tulisan ilmiah hasil dari penelitian mahasiswa-mahasiswa Fakultas Ilmu Komputer Universitas Brawijaya. Jurnal ini diharapkan dapat mengembangkan penelitian ...