A hidden Markov model (HMM) is widely used for sequence modeling in various text classification tasks. This study investigates the impact of different smoothing techniques, such as Laplace, absolute discounting, and Gibbs sampling on HMM performance across three distinct domains: e-commerce products, spam filtering, and occupational data mining. Through the comparative analysis, Laplace smoothing consistently outperforms other techniques in handling zero-probability issues, demonstrating superior performance in the e-commerce and SMS spam datasets. The HMM without any smoothing technique achieved the best results for job title classification. This divergence underscores the dataset-specific nature of smoothing requirements, where the simplicity of parameter estimation proves effective in contexts characterized by a limited and repetitive vocabulary. Hence, the findings suggest that tailored smoothing strategies are crucial for optimizing HMM performance in different textual analysis applications.
Copyrights © 2025