Given the rapid growth of machine learning publications on platforms such as arXiv, there is a need for systematic approaches to understand their objectives and contributions. This study aimed to analyze scientific intentions across domains, identify research trends, and evaluate the impact of external contextual enrichment on automatic intent classification. We perform a cross-domain comparison of research objectives, methodological designs, and application scenarios in machine learning publications, focusing on computer science and biology. We propose IntentBERT-Wiki, an enhanced BERT model enriched with contextual knowledge from Wikipedia, designed for intent classification in scientific documents. Our dataset comprises annotated sentences extracted from arXiv articles, categorized according to established rhetorical role taxonomies. The model’s performance is evaluated using standard classification metrics and compared to a baseline BERT model. Experimental results show that IntentBERT-Wiki achieves F1-scores of 95.9% in computer science and 87.4% in biology, with corresponding accuracies of 96.5% and 91.4%, outperforming the baseline. These findings demonstrate that Wikipedia-based contextual enrichment can significantly improve intent classification accuracy, enhance the organization of academic discourse, and facilitate cross-domain knowledge transfers. This study contributes to the understanding of how machine learning research is framed across disciplines and provides a scalable framework for scientific content analysis.
Copyrights © 2025