This study evaluates image recognition accuracy in explicit content detection by using the Indonesian social context as a comparative reference. Google Vision SafeSearch is employed as a representative automated image recognition system widely used in online content moderation. Although such systems provide efficiency in detecting adult, violent, or racy content, challenges arise when their detection outputs must align with more conservative cultural and religious norms, such as those in Indonesia. A quantitative descriptive-comparative method was applied by testing six representative images based on SafeSearch explicit content categories (adult, racy, violence, medical, and spoof) and comparing the automated detections with Indonesian respondents’ perceptions collected through a Likert-scale questionnaire. Statistical analysis shows a significant difference between the system’s explicit content classifications and human perceptions, with respondents consistently rating explicitness higher than Google Vision API. Despite this difference, a strong Spearman rank correlation indicates that Google Vision SafeSearch is consistent in ranking explicit content levels, although still limited in capturing emotional intensity and cultural sensitivity. These findings highlight how Indonesian social and cultural norms shape the perception of explicit imagery, emphasizing the need for image recognition systems that incorporate local contextual factors.