User-generated reviews on mobile applications represent a valuable yet ambiguous resource for classifying software requirements, particularly when multiple aspects—such as bugs, feature requests, and user experiences—are embedded within a single review. Although prior studies have shown the potential of transformer-based and multi-label models in improving text classification accuracy and efficiency, explicit handling of semantic ambiguity in multi-aspect reviews has not been addressed. This study proposes a multi-label classification approach using BERT-based transfer learning to manage ambiguity in app reviews. Each review is manually annotated with one or more relevant requirement categories. Preprocessing involves text cleaning, normalization, and BERT tokenization to convert reviews into structured representations. The classification model categorizes reviews into four classes: bug reports, feature requests, user experiences, and ratings. Evaluation results demonstrate strong performance, with F1-scores of 0.96 for bug reports, 0.95 for feature requests, 0.97 for ratings, and 0.80 for user experiences, confirming the model’s capability in capturing overlapping labels in ambiguous reviews. This approach offers a scalable and automated solution for extracting software requirements, enabling developers to better identify, categorize, and prioritize user needs from unstructured review data.
Copyrights © 2026