Integra: Journal of Integrated Mathematics and Computer Science
Vol. 1 No. 2 (2024): July

Application of Random Forest Method Classification for Glycosylation in Lysine Protein Sequences

Fitriyana, Silfia (Unknown)
Syarif, Admi (Unknown)
Rossyking, Favorisen (Unknown)
Faisal, Mohammad Reza (Unknown)



Article Info

Publish Date
17 Jul 2024

Abstract

Grouping glycosylated lysine proteins into groups according to the type of glycosylation seen in the lysine protein sequence is known as glycosylation in the lysine protein sequence. In this work, the sensitivity, specificity, accuracy, and Matthew’s correlation coefficient (MCC) of the random forest approach for classifying the glycosylation of lysine protein sequences were examined. With 214 positive and 406 negative data, the lysine protein dataset derived from benchmark data contains 620 total proteins with a protein length of 15 sequences. 90% of the dataset is used for training, while 10% is used for testing. Using the R package BioSeqClass version 1.44.0, feature extraction employed protein descriptors, specifically AA Index, CTD, and PseAAC, with a total of 60 features. The Random Forest classification algorithm was used to reprocess the results with Mtry values of 4, 8, and 16. The number of trees (ntree) was randomly set to 250, 500, 750, and 1000. The best results were achieved with a dataset split of 90% training data and 10% test data, using Mtry of 42 and 1000 trees, resulting in 89.97% sensitivity, 92.79% specificity, 80.76% MCC, and 90.42% accuracy. These results demonstrate that the combination of feature extraction and the Random Forest algorithm is effective in classifying lysine proteins.

Copyrights © 2024






Journal Info

Abbrev

integra

Publisher

Subject

Computer Science & IT Mathematics

Description

Integra : Journal of Integrated Mathematics and Computer Science is the international journal in the field of Mathematics and Computer Science. Integra : Journal of Integrated Mathematics and Computer Science publish original research work both in a full article or in a short communication form, ...