Predictive maintenance (PdM) in industrial manufacturing relies on machine learning classifiers trained on severely imbalanced sensor data, where failure events represent a small minority of observations. This study presents a controlled factorial experiment evaluating five algorithms (Decision Tree, Random Forest, SVM, XGBoost, and Logistic Regression) against four imbalance handling strategies (no handling, SMOTE, ADASYN, and class weighting) across binary and six-class failure mode identification tasks on the AI4I 2020 dataset (10,000 observations, 3.39% failure rate), yielding 40 experimental conditions. All oversampling steps were integrated within an ImbPipeline to prevent data leakage across cross-validation folds. Statistical comparisons were conducted via the Friedman test, post-hoc Nemenyi analysis, and one-tailed Wilcoxon signed-rank tests. XGBoost with no handling achieved the highest performance in both tasks (binary F1 = 0.8952; multiclass F1 = 0.6084). Contrary to common practice, no handling method outperformed SMOTE or ADASYN across four of five algorithms in the binary task (Wilcoxon, p = 0.0312), while class weighting improved macro recall from 0.8448 to 0.8908 without significant F1 degradation. Per-class analysis showed that heat dissipation, power, and overstrain failures were reliably detected (F1 > 0.82), while tool wear and random failures remained undetectable. In the multiclass task, ADASYN and XGBoost class weighting were replaced by SMOTE due to instability with extreme minority classes. These findings demonstrate that synthetic oversampling is not universally beneficial for imbalanced PdM data, and that leakage-free experimental design is essential for reliable performance estimation. Practitioners are advised to benchmark no handling and class weighting before applying synthetic oversampling in PdM deployments.