Hybrid Deep Fixed K-Means (HDF-KMeans)
Abstract
K-Means is one of the most widely used clustering algorithms due to its simplicity, scalability, and computational efficiency. However, its practical application is often hindered by several well-known limitations, such as high sensitivity to initial centroid selection, inconsistency across different runs, and suboptimal performance when dealing with high-dimensional or non-linearly separable data. This study introduces a hybrid clustering algorithm named Hybrid Deep Fixed K-Means (HDF-KMeans) to address these issues. This approach combines the advantages of two state-of-the-art techniques: Deep K-Means++ and Fixed Centered K-Means. Deep K-Means++ leverages deep learning-based feature extraction to transform raw data into more meaningful representations while employing advanced centroid initialization to enhance clustering accuracy and adaptability to complex data structures. Complementarily, Centered K-Means improve the stability of clustering results by locking certain centroids based on domain knowledge or adaptive strategies, effectively reducing variability and convergence inconsistency. Integrating these two methods results in a robust hybrid model that delivers improved accuracy and consistency in clustering performance. The proposed HDF-KMeans algorithm is evaluated using five benchmark medical datasets: Breast Cancer, COVID-19, Diabetes, Heart Disease, and Thyroid. Performance is assessed using standard classification metrics: Accuracy, Precision, Recall, and F1-Score. The results show that HDF-KMeans outperforms traditional K-Means, K-Means++, and K-Means-SMOTE algorithms across all datasets, excelling in overall accuracy and F1 Score. While some trade-offs are observed in specific precision or recall metrics, the model maintains a solid balance, demonstrating reliability. This study highlights HDF-KMeans as a promising and effective solution for complex clustering tasks, particularly in high-stakes domains like healthcare and biomedical analysis.
Keywords
Full Text:
PDFReferences
T. Mohammadi et al., “Unsupervised Machine Learning with Cluster Analysis in Patients Discharged after an Acute Coronary Syndrome: Insights from a 23,270-Patient Study,” The American Journal of Cardiology, vol. 193, pp. 44–51, Apr. 2023, doi: 10.1016/j.amjcard.2023.01.048.
H. Ismkhan and M. Izadi, “K-means-G*: Accelerating K-means clustering algorithm utilizing primitive geometric concepts,” Information Sciences, vol. 618, pp. 298–316, Dec. 2022, doi: 10.1016/j.ins.2022.11.001.
R. M. Alguliyev, R. M. Aliguliyev, and L. V. Sukhostat, “Parallel batch k-means for Big data clustering,” Computers & Industrial Engineering, vol. 152, p. 107023, Feb. 2021, doi: 10.1016/j.cie.2020.107023.
D. Abdullah, C. I. Erliana, A. Bintoro, H. Hartono, M. Ikhwani, and N. Nazaruddin, “Recipient Feasibility Decision Support System Micro Small Medium Business Assistance Use Method Analytic Hierarchy Process and Simple Additives Weighting,” JOIV : International Journal on Informatics Visualization, vol. 8, no. 4, pp. 2119–2124, Dec. 2024, doi: 10.62527/joiv.8.4.2321.
R. Cordeiro de Amorim and V. Makarenkov, “On K-means iterations and Gaussian clusters,” Neurocomputing, vol. 553, p. 126547, Oct. 2023, doi: 10.1016/j.neucom.2023.126547.
H. Hartono, Y. Risyani, E. Ongko, and D. Abdullah, “HAR-MI method for multi-class imbalanced datasets,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, Art. no. 2, Apr. 2020, doi: 10.12928/telkomnika.v18i2.14818.
H. Hartono and E. Ongko, “Avoiding Overfitting dan Overlapping in Handling Class Imbalanced Using Hybrid Approach with Smoothed Bootstrap Resampling and Feature Selection,” JOIV : International Journal on Informatics Visualization, vol. 6, no. 2, pp. 343–348, Jun. 2022, doi: 10.30630/joiv.6.2.985.
M. K. Zuhanda, H. Hartono, S. A. R. S. Hasibuan, D. Abdullah, P. U. Gio, and R. E. Caraka, “Bibliometric analysis of model vehicle routing problem in logistics delivery,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 37, no. 1, Art. no. 1, Jan. 2025, doi: 10.11591/ijeecs.v37.i1.pp590-600.
M. K. Zuhanda, H. Mawengkang, S. Suwilo, Mardiningsih, and O. S. Sitompul, “Logistics distribution supply chain optimization model with VRP in the context of E-commerce”, Accessed: May 15, 2025. [Online]. Available: https://pubs.aip.org/aip/acp/article/2714/1/020049/2889719/Logistics-distribution-supply-chain-optimization
M. K. Zuhanda et al., “Optimization of Vehicle Routing Problem in the Context of E-commerce Logistics Distribution. | EBSCOhost.” Accessed: May 15, 2025. [Online]. Available: https://openurl.ebsco.com/contentitem/gcd:162370971?sid=ebsco:plink:crawler&id=ebsco:gcd:162370971
A. P. U. Siahaan et al., “Comparative study of prim and genetic algorithms in minimum spanning tree and travelling salesman problem,” International Journal of Engineering and Technology(UAE), vol. 7, no. 4, pp. 3654–3661, 2018, doi: 10.14419/ijet.v7i4.20606.
K. Deeparani and P. Sudhakar, “Efficient image segmentation and implementation of K-means clustering,” Materials Today: Proceedings, vol. 45, pp. 8076–8079, Jan. 2021, doi: 10.1016/j.matpr.2021.01.154.
L. Ghosh and D. Konar, “Efficient fuzzy-pruned high dimensional clustering with minimal distance measure,” Expert Systems with Applications, vol. 243, p. 122748, Jun. 2024, doi: 10.1016/j.eswa.2023.122748.
A. Fahim, “K and starting means for k-means algorithm,” Journal of Computational Science, vol. 55, p. 101445, Oct. 2021, doi: 10.1016/j.jocs.2021.101445.
B. Sadeghi, “Clustering in geo-data science: Navigating uncertainty to select the most reliable method,” Ore Geology Reviews, vol. 181, p. 106591, Jun. 2025, doi: 10.1016/j.oregeorev.2025.106591.
C. X. Gao et al., “An overview of clustering methods with guidelines for application in mental health research,” Psychiatry Research, vol. 327, p. 115265, Sep. 2023, doi: 10.1016/j.psychres.2023.115265.
M. J. Simanullang, Hartono, S. Kom, M. Kom, and R. M.I.T, “Combination of SOM, SVR, and LMKNN for Stock Price Prediction,” in 2023 International Conference of Computer Science and Information Technology (ICOSNIKOM), Nov. 2023, pp. 1–5. doi: 10.1109/ICoSNIKOM60230.2023.10364522.
Y. Dai, L. Yang, and Y. Cao, “Long baseline underwater source localization based on deep K-Means++ clustering in complex underwater environments,” Digital Signal Processing, vol. 164, p. 105281, Sep. 2025, doi: 10.1016/j.dsp.2025.105281.
Z.-Z. Long, G. Xu, J. Du, H. Zhu, T. Yan, and Y.-F. Yu, “Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework,” Big Data Research, vol. 23, p. 100170, Feb. 2021, doi: 10.1016/j.bdr.2020.100170.
H. Ismkhan and M. Izadi, “K-means-G*: Accelerating k-means clustering algorithm utilizing primitive geometric concepts,” Information Sciences, vol. 618, pp. 298–316, Dec. 2022, doi: 10.1016/j.ins.2022.11.001.
A. Shahcheraghian, A. Ilinca, and N. Sommerfeldt, “K-means and agglomerative clustering for source-load mapping in distributed district heating planning,” Energy Conversion and Management: X, vol. 25, p. 100860, Jan. 2025, doi: 10.1016/j.ecmx.2024.100860.
M. Salman, “A novel clustering method with consistent data in a three-dimensional graphical format over existing clustering mechanisms,” Information Sciences, vol. 649, p. 119634, Nov. 2023, doi: 10.1016/j.ins.2023.119634.
M. Ay, L. Özbak?r, S. Kulluk, B. Gülmez, G. Öztürk, and S. Özer, “FC-Kmeans: Fixed-centered K-means algorithm,” Expert Systems with Applications, vol. 211, p. 118656, Jan. 2023, doi: 10.1016/j.eswa.2022.118656.
A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognition, vol. 91, pp. 216–231, Jul. 2019, doi: 10.1016/j.patcog.2019.02.023.
Z. Xu, D. Shen, T. Nie, and Y. Kou, “A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data,” Journal of Biomedical Informatics, vol. 107, p. 103465, Jul. 2020, doi: 10.1016/j.jbi.2020.103465.
“UCI Machine Learning Repository.” Accessed: May 15, 2025. [Online]. Available: https://archive.ics.uci.edu/
Y. Chen, C. Lin, J. Liu, and D. Yu, “One-hour-ahead solar irradiance forecast based on real-time K-means++ clustering on the input side and CNN-LSTM,” Journal of Atmospheric and Solar-Terrestrial Physics, vol. 266, p. 106405, Jan. 2025, doi: 10.1016/j.jastp.2024.106405.
D. S. Turan and B. Ordin, “The incremental SMOTE: A new approach based on the incremental k-means algorithm for solving imbalanced data set problem,” Information Sciences, vol. 711, p. 122103, Sep. 2025, doi: 10.1016/j.ins.2025.122103.
DOI: https://doi.org/10.52088/ijesty.v5i3.913
Article Metrics
Abstract view : 0 timesPDF - 0 times
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Muhammad Khahfi Zuhanda, Kelvin Leonardi Kohsasih, Pieter Octaviandy, Hartono Hartono, Dian Kurnia, Nurliana Tarigan, Manan Ginting, Manahan Hutagalung