Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer
Vol 5 No 4 (2021): April 2021

Klasifikasi Fungsi Senyawa Aktif berdasarkan Notasi Simplified Molecular Input Line Entry System (SMILES) menggunakan Metode Random Forest

Faiz Anggiananta Winantoro (Fakultas Ilmu Komputer, Universitas Brawijaya)
Dian Eka Ratnawati (Fakultas Ilmu Komputer, Universitas Brawijaya)
Syaiful Anam (Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Brawijaya)



Article Info

Publish Date
17 Mar 2021

Abstract

A compound is a single substance composed of two or more elements that form chemical bonds. There are two types of compounds, namely active compounds and inactive compounds. Active compounds are compounds that have physiological effects on other organisms. In Indonesia, there are still many active compounds whose function is unknown. Therefore, a classification method is needed to help determine the function of the active compound. Classification is done with data written in SMILES notation. From the SMILES notation, features such as the number of atoms B, C, N, O, P, S, F, Cl, Br, I, OH, =, #, @, -, +, COC, C = C, are taken. O-], N +, C = O, and () go through the preprocessing process. Before being used for the classification process, all these features are divided by the length of the SMILES notation to get their value. This research was conducted to classify the function of active compounds by applying the Random Forest (RF) method with the SMILES data object with 4 classes of compound functions. RF was chosen because this method has almost no overfitting conditions, is able to handle data with many features, and this method is not affected by datasets that have missing values. The best accuracy resulted in testing with 4 class data is 69% and the best average in testing with the K-Fold Cross Validation method is 63%. Then, on the data with 3 classes of compound functions, the best accuracy is 76% and the best average in testing with the K-Fold Cross Validation method is 70%. Finally, testing data with 2 classes of compound functions produces the highest accuracy of 86% and the best average of 80%.

Copyrights © 2021






Journal Info

Abbrev

j-ptiik

Publisher

Subject

Computer Science & IT Control & Systems Engineering Education Electrical & Electronics Engineering Engineering

Description

Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer (J-PTIIK) Universitas Brawijaya merupakan jurnal keilmuan dibidang komputer yang memuat tulisan ilmiah hasil dari penelitian mahasiswa-mahasiswa Fakultas Ilmu Komputer Universitas Brawijaya. Jurnal ini diharapkan dapat mengembangkan penelitian ...