Positive-Unlabeled (PU) learning has become a pivotal tool in scenarios where only positive samples are labeled, and negative labels are unavailable. However, in practical applications, the labeled positive data often contains noise such as mislabeled or outlier instances that can severely degrade model performance. This issue is exacerbated using traditional surrogate loss functions, many of which are unbounded and overly sensitive to mislabeled examples. To address this limitation, we propose a robust PU learning framework that integrates bounded loss functions, including ramp loss and truncated logistic loss, into the non-negative risk estimation paradigm. Unlike conventional loss formulations that allow noisy samples to disproportionately influence training, our approach caps each instance’s contribution, thereby reducing the sensitivity to label noise. We mathematically reformulate the PU risk estimator using bounded surrogates and demonstrate that this formulation maintains risk consistency while offering improved noise tolerance. A detailed framework diagram and algorithmic description are provided, along with theoretical analysis that bounds the influence of corrupted labels. Extensive experiments are conducted on both synthetic and real-world datasets under varying noise levels. Our method consistently outperforms baseline models such as unbiased PU (uPU) and non-negative PU (nnPU) in terms of classification accuracy, area under the receiver operating characteristic curve (ROC AUC), and precision-recall area under the curve (PR AUC). The ramp loss variant exhibits particularly strong robustness without sacrificing optimization efficiency. These results demonstrate that incorporating bounded losses is a principled and effective strategy for enhancing the reliability of PU learning in noisy environments.
Copyrights © 2025