Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal JOIV : International Journal on Informatics Visualization

Kim, Minseok

Unknown Affiliation

Author-ID : 9404064

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Title

Enhancing Code Similarity with Augmented Data Filtering and Ensemble Strategies Kim, Gyeongmin; Kim, Minseok; Jo, Jaechoon
JOIV : International Journal on Informatics Visualization Vol 6, No 3 (2022)
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.6.3.1259

Although COVID-19 has severely affected the global economy, information technology (IT) employees managed to perform most of their work from home. Telecommuting and remote work have promoted a demand for IT services in various market sectors, including retail, entertainment, education, and healthcare. Consequently, computer and information experts are also in demand. However, producing IT, experts is difficult during a pandemic owing to limitations, such as the reduced enrollment of international students. Therefore, researching increasing software productivity is essential; this study proposes a code similarity determination model that utilizes augmented data filtering and ensemble strategies. This algorithm is the first automated development system for increasing software productivity that addresses the current situationâ€”a worldwide shortage of software dramatically improves performance in various downstream natural language processing tasks (NLP). Unlike general-purpose pre-trained language models (PLMs), CodeBERT and GraphCodeBERT are PLMs that have learned both natural and programming languages. Hence, they are suitable as code similarity determination models. The data filtering process consists of three steps: (1) deduplication of data, (2) deletion of intersection, and (3) an exhaustive search. The best mating (BM) 25 and length normalization of BM25 (BM25L) algorithms were used to construct positive and negative pairs. The performance of the model was evaluated using the 5-fold cross-validation ensemble technique. Experiments demonstrate the effectiveness of the proposed method quantitatively. Moreover, we expect this method to be optimal for increasing software productivity in various NLP tasks.

Co-Authors Jo, Jaechoon Kim, Gyeongmin

Title Search

Found 1 Documents Search Journal : JOIV : International Journal on Informatics Visualization

Abstract

Title

Found 1 Documents
Search
Journal : JOIV : International Journal on Informatics Visualization