International Journal of Electrical and Computer Engineering
Vol 13, No 2: April 2023

Self-admitted technical debt classification using natural language processing word embeddings

Ahmed F. Sabbah (Birzeit University)
Abualsoud A. Hanani (Birzeit University)



Article Info

Publish Date
01 Apr 2023

Abstract

Recent studies show that it is possible to detect technical dept automatically from source code comments intentionally created by developers, a phenomenon known as self-admitted technical debt. This study proposes a system by which a comment or commit is classified as one of five dept types, namely, requirement, design, defect, test, and documentation. In addition to the traditional term frequency-inverse document frequency (TF-IDF), several word embeddings methods produced by different pre-trained language models were used for feature extraction, such as Word2Vec, GolVe, bidirectional encoder representations from transformers (BERT), and FastText. The generated features were used to train a set of classifiers including naive Bayes (NB), random forest (RF), support vector machines (SVM), and two configurations of convolutional neural network (CNN). Two datasets were used to train and test the proposed systems. Our collected dataset (A-dataset) includes a total of 1,513 comments and commits manually labeled. Additionally, a dataset, consisting of 4,071 labeled comments, used in previous studies (M-dataset) was also used in this study. The RF classifier achieved an accuracy of 0.822 with A-dataset and 0.820 with the M-dataset. CNN with A-dataset achieved an accuracy of 0.838 using BERT features. With M-dataset, the CNN achieves an accuracy of 0.809 and 0.812 with BERT and Word2Vec, respectively.

Copyrights © 2023






Journal Info

Abbrev

IJECE

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

International Journal of Electrical and Computer Engineering (IJECE, ISSN: 2088-8708, a SCOPUS indexed Journal, SNIP: 1.001; SJR: 0.296; CiteScore: 0.99; SJR & CiteScore Q2 on both of the Electrical & Electronics Engineering, and Computer Science) is the official publication of the Institute of ...