Abualsoud A. Hanani
Birzeit University

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : International Journal of Electrical and Computer Engineering

Self-admitted technical debt classification using natural language processing word embeddings Ahmed F. Sabbah; Abualsoud A. Hanani
International Journal of Electrical and Computer Engineering (IJECE) Vol 13, No 2: April 2023
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijece.v13i2.pp2142-2155

Abstract

Recent studies show that it is possible to detect technical dept automatically from source code comments intentionally created by developers, a phenomenon known as self-admitted technical debt. This study proposes a system by which a comment or commit is classified as one of five dept types, namely, requirement, design, defect, test, and documentation. In addition to the traditional term frequency-inverse document frequency (TF-IDF), several word embeddings methods produced by different pre-trained language models were used for feature extraction, such as Word2Vec, GolVe, bidirectional encoder representations from transformers (BERT), and FastText. The generated features were used to train a set of classifiers including naive Bayes (NB), random forest (RF), support vector machines (SVM), and two configurations of convolutional neural network (CNN). Two datasets were used to train and test the proposed systems. Our collected dataset (A-dataset) includes a total of 1,513 comments and commits manually labeled. Additionally, a dataset, consisting of 4,071 labeled comments, used in previous studies (M-dataset) was also used in this study. The RF classifier achieved an accuracy of 0.822 with A-dataset and 0.820 with the M-dataset. CNN with A-dataset achieved an accuracy of 0.838 using BERT features. With M-dataset, the CNN achieves an accuracy of 0.809 and 0.812 with BERT and Word2Vec, respectively.