Journal of Applied Data Sciences
Vol 6, No 3: September 2025

A New Data Preprocessing Framework to Enhance the Accuracy of Herbal Plants Classification Using Deep Learning

Kunlerd, Attapol (Unknown)
Ritthiron, Atipat (Unknown)
Nabumroong, Boonlueo (Unknown)
Luangmaneerote, Sakchan (Unknown)
Chaiwachirakhampon, Anyawee (Unknown)
Kaewyotha, Jakkrit (Unknown)



Article Info

Publish Date
25 Jun 2025

Abstract

This research proposes to solve the problem of herbal plant classification, which plays a key role in Thai pharmacy and traditional medicine. Moreover, there are limitations due to similar physical characteristics of plants and the reliance on specialists to classify herbal plants, which hinder the utilization of herbal plants by the general public at the local level. To solve this problem, this research presents a new preprocessing framework called P4, which integrates 7 techniques as follow: Image Cropping, Resizing, Normalization (0–1), Data Augmentation, Label Noise, Label Cleaning, and Dataset Quality Score (DQS). The prominent point of P4 technique is the combination of intentional mislabeling and label cleaning process, as well as, quantitative data quality assessment and additional expert review in order to filter out potentially inaccurate data before inputting to Deep Learning model. In the experiment, a dataset of 4,211 herbal images covering 30 herbal plant species is used and compared with 3 proposed techniques in previous research (P1–P3) with 5 deep learning architectures, namely DenseNet201, EfficientNetB7, ViT, Swin Transformer, and ConvNeXt. The experimental results showed that the P4 technique combined with DenseNet201 model provided the highest performance in herbal plant classification, with an Accuracy of 92%, Precision of 92%, Recall of 91%, and a training time of merely 22.92 minutes. This was a result of combining the good data quality from the P4 technique, which enhanced to increase efficiency in producing higher quality and more balanced data. When combined with the structural capability of DenseNet201 that supported feature reuse from previous layers, it increased the robustness to mislabeled data and was able to accurately distinguish plants with similar characteristics. The results of this experiment are able be applied as a guideline for future application in Thai traditional medicine support system and herbal plant learning system.

Copyrights © 2025






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...