This study presents a novel application of parallel clustering algorithms to segment stocks in the Chinese A-share market based on financial indicators. Using the Hadoop platform and Mahout software library, we implemented and compared the performance of the K-means and fuzzy K-means algorithms across five distance measures: Euclidean, squared Euclidean, Manhattan, cosine, and Tanimoto. The analysis utilized 15 financial indicators from 2,544 listed companies to reflect profitability, solvency, growth capability, asset management quality, and shareholder profitability. The experimental results demonstrate that for stock financial data clustering, the K-means algorithm with Tanimoto distance yields optimal execution efficiency and clustering quality, whereas the fuzzy K-means algorithm performs best with squared Euclidean distance. However, the K-means algorithm proved to be more effective overall, successfully categorizing 1,483 stocks into 26 meaningful segments compared to only 511 stocks in 27 segments using fuzzy K-means. The resulting stock segmentation framework divides the market into eight comprehensive categories based on investment value and security, thereby providing investors with practical guidance for stock selection. Our approach enables investors to understand the fundamental characteristics of each stock segment, discern their distinctive features, and identify undervalued stocks with appreciative potential. This study represents the first application of parallel big data clustering algorithms to segment the entire Chinese A-share market, offering significant practical value for investment decision-making.
Copyrights © 2025