The exponential growth of scientific literature on platforms such as arXiv presents a major challenge in identifying and comparing key contributions to machine learning across diverse academic domains. To address this, we propose GraphiBERT-ML, a knowledge-enhanced extension of BERT that integrates semantic embeddings extracted from DBpedia to improve named entity recognition (NER) in scientific articles. To the best of our knowledge, this study presents the first knowledge-enhanced NER model that explicitly integrates DBpedia-based embeddings for large-scale cross-domain scientific analyses. The model was evaluated on a cross-domain dataset spanning eight fields, including computer science, physics, biology, finance, and economics. Experimental results show that GraphiBERT-ML achieves its highest performance in computer science, with an accuracy of 0.9372, an F1-score of 0.9368, and a precision of 0.9376. Physics and mathematics also demonstrate strong performance (F1-scores of 0.9115 and 0.8970), while more heterogeneous domains such as biology and finance show lower scores (F1-scores of 0.7946 and 0.7872), reflecting the complexity and variability of their terminology. Across all domains, GraphiBERT-ML consistently outperformed the baseline BERT model, confirming the benefit of external knowledge integration for scientific NER. These findings highlight domain-specific challenges in entity extraction and demonstrate the potential of knowledge-augmented models to advance cross-disciplinary analysis of machine learning research.
Copyrights © 2026