Garuda - Garba Rujukan Digital

International Journal of Informatics and Communication Technology (IJ-ICT)

Vol 13, No 3: December 2024

Jenkins, Thomas (Unknown)
Goodwin, Autumn (Unknown)
Talafha, Sameerah (Unknown)

Publish Date
01 Dec 2024

In classification problems, mislabeled data can have a dramatic effect on the capability of a trained model. The traditional method of dealing with mislabeled data is through expert review. However, this is not always ideal, due to the large volume of data in many classification datasets, such as image datasets supporting deep learning models, and the limited availability of human experts for reviewing the data. Herein, we propose an ordered sample consensus (ORSAC) method to support data cleaning by flagging mislabeled data. This method is inspired by the random sample consensus (RANSAC) method for outlier detection. In short, the method involves iteratively training and testing a model on different splits of the dataset, recording misclassifications, and flagging data that is frequently misclassified as probably mislabeled. We evaluate the method by purposefully mislabeling subsets of data and assessing the method’s capability to find such data. We demonstrate with three datasets, a mosquito image dataset, CIFAR-10, and CIFAR-100, that this method is reliable in finding mislabeled data with a high degree of accuracy. Our experimental results indicate a high proficiency of our methodology in identifying mislabeled data across these diverse datasets, with performance assessed using different mislabeling frequencies.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

International Journal of Informatics and Communication Technology (IJ-ICT)

Website

Abbrev

IJICT

Publisher

Institute of Advanced Engineering and Science

Subject

Computer Science & IT

Description

International Journal of Informatics and Communication Technology (IJ-ICT) is a common platform for publishing quality research paper as well as other intellectual outputs. This Journal is published by Institute of Advanced Engineering and Science (IAES) whose aims is to promote the dissemination of ...

Article Info

Abstract

An ORSAC method for data cleaning inspired by RANSAC

Article Info

Abstract