The scarcity of labeled data may hamper training text-processing models. In response to this issue, a novel and intriguing strategy that combines the co-training method and pseudo-labeling design is applied to enhance the model's performance. This method, a component of an efficient semi-supervised learning paradigm for processing and comprehending text, is a fresh perspective in the field. The model, which combines a support vector machine (SVM) for classification and long short-term memory (LSTM) for text sequence interpretation, is a unique approach. By introducing samples that may be marginalized in the labeled data, the co-training approach could help solve the class imbalance problem by using a small amount of labeled data and the rest unlabeled. This study assesses the model's performance using a student dataset from higher education institutions to establish a threshold for each model's degree of confidence and ascertain how much the model can be generalized depending on the threshold. The SVM threshold was calculated as >=0.88, and the LSTM threshold was calculated as >=0.5 using a mixture of confidence metrics.
Copyrights © 2025