Realtime crime scene detection is a vital issue for ensuring security in various environments. Building on recent advancements in machine learning algorithms, this paper presents an IoT framework for real-time weapon and face detection. By deploying a convolutional neural network (CNN) architecture in Vertex AI and utilizing the portable camera module of a Raspberry Pi, to detect whether a person is carrying a weapon. This is achieved by pre-processing, which we resize and annotate the images. Then, train and validate the CNN model with the annotated label dataset. The trained model is saved in Google Cloud’s Vertex AI portal. Then we tested the model by uploading live images from a camera as well as a few video clips, to a Django application in amazon web hosting services (AWS) to Vertex AI. The model exhibited an accuracy of 97.2% along with a F1 score of 0.97. In addition, the model outperforms the other state-of-the-art models by less trainable parameters and higher accuracy.