JOIV : International Journal on Informatics Visualization
Vol 6, No 3 (2022)

Enhancing Code Similarity with Augmented Data Filtering and Ensemble Strategies

Kim, Gyeongmin (Unknown)
Kim, Minseok (Unknown)
Jo, Jaechoon (Unknown)



Article Info

Publish Date
30 Sep 2022

Abstract

Although COVID-19 has severely affected the global economy, information technology (IT) employees managed to perform most of their work from home. Telecommuting and remote work have promoted a demand for IT services in various market sectors, including retail, entertainment, education, and healthcare. Consequently, computer and information experts are also in demand. However, producing IT, experts is difficult during a pandemic owing to limitations, such as the reduced enrollment of international students. Therefore, researching increasing software productivity is essential; this study proposes a code similarity determination model that utilizes augmented data filtering and ensemble strategies. This algorithm is the first automated development system for increasing software productivity that addresses the current situation—a worldwide shortage of software dramatically improves performance in various downstream natural language processing tasks (NLP). Unlike general-purpose pre-trained language models (PLMs), CodeBERT and GraphCodeBERT are PLMs that have learned both natural and programming languages. Hence, they are suitable as code similarity determination models. The data filtering process consists of three steps: (1) deduplication of data, (2) deletion of intersection, and (3) an exhaustive search. The best mating (BM) 25 and length normalization of BM25 (BM25L) algorithms were used to construct positive and negative pairs. The performance of the model was evaluated using the 5-fold cross-validation ensemble technique. Experiments demonstrate the effectiveness of the proposed method quantitatively. Moreover, we expect this method to be optimal for increasing software productivity in various NLP tasks.

Copyrights © 2022






Journal Info

Abbrev

joiv

Publisher

Subject

Computer Science & IT

Description

JOIV : International Journal on Informatics Visualization is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of Computer Science, Computer Engineering, Information Technology and Visualization. The journal publishes state-of-art ...