Leukemia is one of the cancers with the highest mortality rate worldwide; therefore, identifying its subtypes is crucial to support accurate diagnosis and effective treatment. The analysis of high-dimensional gene expression data, such as the CuMiDa dataset, still faces major challenges due to overlapping patterns and limited sample sizes. This study proposes the application of Bayesian Optimization using Optuna to perform hyperparameter tuning on the Spectral Clustering – K-Means method to improve the clustering performance of leukemia subtypes. Four key parameters (n_components, affinity method, n_neighbors, and gamma) were optimized through 1,000 iterations. The best configuration was obtained at n_components = 5 using the Nearest Neighbors method with n_neighbors = 6. The resulting Spectral Embedding matrix was then grouped using K-Means. The results showed that this approach achieved a clustering accuracy of 92,19%, outperforming both K-Means and Hierarchical Clustering when applied separately. Heatmap visualization demonstrated that the optimized method effectively grouped samples with similar gene expression patterns. This study demonstrates that the combination of Spectral Clustering–K-Means and Bayesian optimization using Optuna can improve the clustering quality of complex gene expression data and open up broader opportunities for application in other bioinformatics studies.
Copyrights © 2025