Crowd counting plays an important role in the surveillance of the safety of the people, traffic, and intelligent surveillance systems. However, the exact density estimations remain hard to achieve in highly congested scenes due to the tough occlusion, large-scale variance, and complicated background. Although the recent deep-learning methods have high performance, several of them do not need computationally efficient underlying backbone networks, and rather, they employ an external teacher-student distillation architecture, which can limit their use in resource-constrained applications. To avoid this problem, we introduce LSKD, a lightweight self-knowledge distillation network that is density map regression-specific. Unlike other conventional teacher-dependent processes, LSKD can also independently carry out internal multi-level feature alignment within a single small network that is not in need of an external teacher model. The structure integrates a Feature Matching Block (FMB) and a Context Fusion (CoFuse) block to enhance the hierarchical match of features and global awareness of context. The large experiments demonstrate that LSKD obtain competitive performance using the number of parameters as 2.65 million and GFLOPs as 10.23. Particularly, it has 63.17 MAE on ShanghaiTech Part A, 8.94 on ShanghaiTech Part B, 143.7 on UCF-QNRF, and 223.88 on UCF-CC-50, which is a good ratio between the accuracy and the efficiency of the calculations. Such results indicate that LSKD has an implementable and efficient solution to the real-time counting of crowds at the edge devices.