This study evaluates Carrier-style memory malware detection under obfuscation using a reproducible, audit-driven protocol for verifiable reporting. We reproduce a stacking pipeline (Naive Bayes, Random Forest, Decision Tree with a Logistic Regression meta-learner) and benchmark it against strong single-model baselines. To limit leakage, we apply exact deduplication, train-only preprocessing, and group-disjoint splitting with explicit overlap checks, and we report dataset difficulty diagnostics to interpret near-ceiling results. Transfer is tested via cross-collection evaluation on the shared feature intersection between Obfuscated MalMem2022 and MemMalDet 2024, separating a low-shift validation setting from a higher-shift stress setting to keep generalization claims bounded. Robustness is assessed under a feasibility-preserving feature-space threat model with empirical bounds, non-negativity, and integer rounding, using a coordinate-search attack on the clean-correct subset across L0 budgets B=1,3,5, and 10 with confidence intervals. On obfuscated MalMem2022, Random Forest achieves 99.99% Accuracy, 99.99% F1, and 1.00 AUC, while the Carrier-style stack reaches 99.92% Accuracy, 99.92% F1, and 1.00 AUC, with no meaningful improvement over the best single model. Cross-collection validation yields F1 = 99.98 and AUC = 1.0, consistent with low-shift stability under aligned features rather than broad domain generalization. At B=10, ASR is 0.03 (95% CI: 0.0138–0.0639), and baseline defenses show clean-versus-robust trade-offs without consistent ASR reduction. We release four reusable artifacts an audit table, a leakage ablation matrix, a shift-aware cross-collection report, and robustness curves with confidence intervals.