Vulgar and pornographic content has become a widespread issue on the internet, appearing in various fields include anime. Vulgar pornographic content in anime is not limited to the sexuality genre; anime from general genres such as action, adventure, and others also contain vulgar visual. The main focus of this research is the implementation of the Detection Transformer (DETR) object detection method to identify vulgar parts of anime characters, particularly female characters. DETR is a deep learning model designed for object detection tasks, adapting the attention mechanism of Transformers. The dataset used consists of 800 images taken from popular anime, based on viewership rankings, which were augmented to a total of 1,689 images. The research involved training models with different backbones, specifically ResNet-50 and ResNet-101, each with dilation convolution applied at different stages. The results show that the DETR model with a ResNet-50 backbone and dilation convolution at stage 5 outperformed other backbones and dilation configurations, achieving a mean Average Precision of 0.479 and  of 0.875. The other result is dilated convolution improves small object detection by enlarging the receptive field, applying it in early stages tends to reduce spatial detail and harm performance on medium and large objects. However, the primary focus of this research is not solely on achieving the highest performance but on exploring the potential of transformer-based models, such as DETR, for detecting vulgar content in anime. DETR benefits from its ability to understand spatial context through self-attention mechanisms, offering potential for further development with larger datasets, more complex architectures, or training at larger data scales.
                        
                        
                        
                        
                            
                                Copyrights © 2025