Detecting contradictions within low-resource legislative texts presents significant challenges due to limited labeled data, complex legal language, and the vast number of verses contained within legal documents. These contradictions can lead to legal ambiguities and disputes if not addressed effectively. To tackle this problem, this study proposes a comprehensive system that combines document matching with contradiction detection. Legal documents are first clustered based on contextual similarity, enabling a more targeted analysis of potentially contradictory verses. Among several clustering approaches tested, keyword similarity-based clustering using KeyBERT produced the highest MatchingScore of 0.6111. To overcome the scarcity of labeled data, we employed a multi-step strategy involving manual annotation, generative AI-based data augmentation, and self-training techniques. The contradiction detection model was developed using the XLM-RoBERTa architecture, trained on TPU V2 with a batch size of 64. The model achieved strong performance, with 0.978 recall, 0.9356 precision, 0.982 accuracy, and a 0.9566 F1-score, completing each epoch in 82 seconds. This integrated approach significantly reduces the complexity of contradiction detection in legislative documents while ensuring high accuracy and robustness.
Copyrights © 2025