Irfan Subakti, Misbakhul Munir
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

OVERSAMPLING HYBRID METHOD FOR HANDLING MULTI-LABEL IMBALANCED Tursina, Dara; Anggraeni, Sherly Rosa; Fatichah, Chastine; Irfan Subakti, Misbakhul Munir
JUTI: Jurnal Ilmiah Teknologi Informasi Vol. 22, No. 1, January 2024
Publisher : Department of Informatics, Institut Teknologi Sepuluh Nopember

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12962/j24068535.v22i1.a1208

Abstract

Data and information continue to increase along with the development of digital technology. Data availability is becoming increasingly numerous and complex. The existence of unbalanced data causes classification errors due to the dominance of majority-class data over the minority class. Not only limited to the binary class, but data imbalance is also often encountered in multi-label data, which become increasingly important in recent years due to its vast application scope. However, the problem of class imbalance has been a characteristic of many complex multi-label datasets, making it the focus of this research. Handling unbalanced multi-label data still has a lot of potential for development. One approach, Synthetic Oversampling of Multi-Label Data Based on Local Label Distribution (MLSOL) and Integrating Unsupervised Clustering and Label-specific Oversampling to Tackle Imbalanced Multi-Label Data (UCLSO), has been developed. UCLSO's attention only focuses on the majority class, which can lead to data imbalance and excessive overfitting. Although effective in preventing majority class domination, this approach cannot overcome the lack of variation within the minority class. By contrast, MLSOL focuses on minority classes, allowing for variations in multi-label data and significantly improving classification performance. This research aims to overcome the problem of data imbalance by combining the MLSOL and UCLSO oversampling methods. Combining these two approaches is expected to exploit the strengths and reduce the weaknesses of each, resulting in significant performance improvements. The trial results show that the hybrid oversampling method produces the highest value on biological data with an F-1 score of 88%. Meanwhile, the single oversampling methods UCLSO and MLSOL on biological data produce an F-1 score of 67% and 62%, respectively.