Faiz Anggiananta Winantoro
Fakultas Ilmu Komputer, Universitas Brawijaya

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Klasifikasi Fungsi Senyawa Aktif berdasarkan Notasi Simplified Molecular Input Line Entry System (SMILES) menggunakan Metode Random Forest Faiz Anggiananta Winantoro; Dian Eka Ratnawati; Syaiful Anam
Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol 5 No 4 (2021): April 2021
Publisher : Fakultas Ilmu Komputer (FILKOM), Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

A compound is a single substance composed of two or more elements that form chemical bonds. There are two types of compounds, namely active compounds and inactive compounds. Active compounds are compounds that have physiological effects on other organisms. In Indonesia, there are still many active compounds whose function is unknown. Therefore, a classification method is needed to help determine the function of the active compound. Classification is done with data written in SMILES notation. From the SMILES notation, features such as the number of atoms B, C, N, O, P, S, F, Cl, Br, I, OH, =, #, @, -, +, COC, C = C, are taken. O-], N +, C = O, and () go through the preprocessing process. Before being used for the classification process, all these features are divided by the length of the SMILES notation to get their value. This research was conducted to classify the function of active compounds by applying the Random Forest (RF) method with the SMILES data object with 4 classes of compound functions. RF was chosen because this method has almost no overfitting conditions, is able to handle data with many features, and this method is not affected by datasets that have missing values. The best accuracy resulted in testing with 4 class data is 69% and the best average in testing with the K-Fold Cross Validation method is 63%. Then, on the data with 3 classes of compound functions, the best accuracy is 76% and the best average in testing with the K-Fold Cross Validation method is 70%. Finally, testing data with 2 classes of compound functions produces the highest accuracy of 86% and the best average of 80%.