The development of software applications involves translating software requirement specifications (SRS) into structured models that guide system design. Among these, sequence diagrams are essential for visualizing dynamic interactions, but their manual construction from natural language descriptions is often error-prone and time-consuming. This study proposes an automated method for extracting sequence diagram elements namely classes, subclasses, and attributes from scenario sections of SRS documents. The approach leverages Natural Language Processing (NLP) techniques, combining Bidirectional Encoder Representations from Transformers (BERT) for contextual embeddings and Support Vector Machine (SVM) for classification. Noun phrases are identified and classified into UML-relevant entities using this hybrid model. To evaluate performance, two datasets SIData and SILo were used, each exhibiting distinct textual styles and domain characteristics. The system’s effectiveness was assessed using standard evaluation metrics such as precision, recall, and F1-score. Results indicate that the method is capable of capturing contextual relationships between extracted elements, although its performance varies across datasets, suggesting the need for further refinement. Overall, the study contributes toward automating early software design phases and reducing manual modeling effort.
Copyrights © 2025