The rapid expansion of the Internet of Things (IoT) has increased connectivity across various sectors but also exposed systems to new and evolving cybersecurity threats. One of the most critical threats is the reconnaissance phase, where attackers gather system information to prepare more sophisticated intrusions. Conventional intrusion detection systems often fail to detect reconnaissance due to similarities with benign traffic. To address this problem of ineffective reconnaissance detection, this study proposes a hybrid detection framework that combines autoencoder-based feature extraction with a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) classifier. The autoencoder, an unsupervised neural network that compresses input data and reconstructs it with minimal loss, is used to reduce data dimensionality and learn meaningful hidden features. The CNN captures spatial patterns and LSTM models temporal dependencies in network traffic. Experiments were conducted using the CICIoT2023 dataset, focusing exclusively on reconnaissance attacks. The evaluation metrics include accuracy, precision, recall, specificity, False Positive Rate (FPR), False Negative Rate (FNR), and F1-score. Results show that the proposed model achieves an overall accuracy of 99.79%, specificity of 0.9994, precision of 0.9948, recall of 0.9445, and F1-score of 0.9648. Class-level analysis demonstrates high performance across most attack types, though Ping Sweep exhibits a lower recall of 0.6853 despite achieving perfect precision. These results demonstrate that the hybrid CNN–LSTM model with autoencoder-based feature extraction can effectively detect reconnaissance attacks in IoT networks. The approach enhances detection accuracy, reduces false alarms, and provides a promising foundation for improving real-world IoT security monitoring systems.