Protein-protein interactions (PPIs) are key to cellular functions and disease mechanisms and are crucial for drug discovery and systems biology. Though experimental approaches, including yeast two-hybrid systems, provide informative discoveries, they are time-consuming, costly, and frequently yield significant false-positive rates. Newer computational tools, including DeepPPI and PIPR, have demonstrated their potential, but their reliance on single-modal features or specific machine-learning models limits their generalization and robustness. These limitations highlight the need for an enhanced framework that assimilates different types of features while integrating a diverse array of machine learning models to exploit the strengths offered by each model class. In this paper, we present a hybrid machine learning framework, HybridPPI, to effectively incorporate the power of sequence-based, structure-based, and network-based features based on wellknown ensemble learning techniques to predict PPIs. Our proposed algorithm is a stacking ensemble of multiple models (Support Vector Machines (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), and Long Short-Term Memory Networks (LSTM)), with Gradient Boosting as the metamodel. Results show that HybridPPI (94.5% accuracy, 95.2% precision, and Area Under Curve of 0.97) outperforms the most advanced methods, indicating its robustness for PPI prediction. This scalable and generalizable framework can accommodate various biological applications. HybridPPI overcomes significant shortcomings of current methodologies and contributes to biological discovery.
Copyrights © 2025