Penerapan Algoritma XGBoost untuk Prediksi Diabetes: Analisis Confusion Matrix dan ROC Curve
DOI:
https://doi.org/10.21111/fij.v10i1.14311Abstract
Abstrak Diabetes melitus merupakan gangguan metabolisme kronis yang menjadi perhatian kesehatan global yang terus meningkat, ditandai dengan tingkat prevalensi yang terus meningkat. Prediksi dini dan diagnosis yang akurat sangat penting untuk manajemen penyakit yang efektif dan pencegahan komplikasi. Studi ini menyajikan kerangka metodologis untuk mengoptimalkan algoritma XGBoost guna meningkatkan akurasi prediksi diabetes sekaligus meminimalkan kesalahan klasifikasi, dengan penekanan khusus pada pengurangan negatif palsu karena implikasi klinisnya yang signifikan. Metodologi pembelajaran mesin kami menggabungkan praproses data yang komprehensif, pengoptimalan hiperparameter sistematis melalui pencarian grid, dan evaluasi model yang ketat menggunakan analisis matriks kebingungan dan metrik ROC-AUC. Basis Data Diabetes Pima Indians dipartisi menggunakan pemisahan uji-latihan 70:30 untuk memastikan generalisasi model yang kuat. Model XGBoost yang dioptimalkan menunjukkan metrik kinerja yang luar biasa: akurasi (96,33%), presisi (93,4%), perolehan kembali (97,16%), skor F1 (95,7%), dan skor ROC-AUC (0,99). Analisis terperinci dari matriks kebingungan mengungkapkan 205 positif benar dan 373 negatif benar, dengan hanya 16 positif salah dan 6 negatif salah, yang menunjukkan kemampuan diagnostik unggul.Temuan ini menunjukkan bahwa algoritme XGBoost kami yang dioptimalkan merupakan alat pendukung keputusan yang berharga bagi praktisi perawatan kesehatan dalam deteksi dini diabetes. Meskipun model tersebut menunjukkan kinerja keseluruhan yang luar biasa, pengurangan lebih lanjut dari hasil negatif palsu tetap menjadi target penting untuk meningkatkan keselamatan klinis. Studi ini memberikan kontribusi signifikan terhadap ilmu data medis dengan membangun kerangka kerja yang kuat dan dioptimalkan untuk prediksi diabetes menggunakan teknik pembelajaran mesin tingkat lanjut, dengan aplikasi potensial dalam sistem pendukung keputusan klinis dan strategi perawatan kesehatan preventif. Kata kunci: Diabetes mellitus, XGBoost, confusion matrix, ROC-AUC, optimasi hyperparameter.  Abstract Diabetes mellitus is a chronic metabolic disorder that is a growing global health concern, characterized by an increasing prevalence rate. Early prediction and accurate diagnosis are essential for effective disease management and prevention of complications. The study presents a methodological framework for optimizing the XGBoost algorithm to improve the accuracy of diabetes predictions while minimizing misclassification, with a special emphasis on the reduction of false negatives due to its significant clinical implications. Our machine learning methodology combines comprehensive data preprocessing, systematic hyperparameter optimization through grid search, and rigorous model evaluation using confusion matrix analysis and ROC-AUC metrics. The Pima Indians Diabetes Database is partitioned using an 70:30 test-exercise split to ensure robust model generalization. The optimized XGBoost model shows outstanding performance metrics: accuracy (96.33%), precision (93.4%), regain (97.16%), F1 score (95.7%), and ROC-AUC score (0.99). A detailed analysis of the confusion matrix revealed 205 true positives and 373 true negatives, with only 16 false positives and 6 false negatives, indicating superior diagnostic capabilities. These findings suggest that our optimized XGBoost algorithm is a valuable decision support tool for healthcare practitioners in the early detection of diabetes. Although the model shows excellent overall performance, further reduction of false-negative results remains an important target for improving clinical safety. The study makes a significant contribution to medical data science by building a robust and optimized framework for diabetes prediction using advanced machine learning techniques, with potential applications in clinical decision support systems and preventive health care strategies. Keywords: Diabetes mellitus, XGBoost, confusion matrix, ROC-AUC, hyperparameter optimization.Downloads
Submitted
Accepted
Published
Issue
Section
License
Copyright (c) 2025 Erliyan Redy Susanto, Agum Cahyana

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Please find the rights and licenses in the Fountain of Informatics Journal (FIJ). By submitting the article/manuscript of the article, the author(s) agree with this policy. No specific document sign-off is required.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.Â
2. Author(s)' Warranties
The author warrants that the article is original, written by the stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author, and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User/Public Rights
FIJ's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, FIJ permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and FIJ on distributing works in the journal and other media of publications. Unless otherwise stated, the authors are public entities as soon as their articles got published.Â
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
- Copyright and other proprietary rights relating to the article, such as patent rights,
- The right to use the substance of the article in own future works, including lectures and books,
- The right to reproduce the article for own purposes,
- The right to self-archive the article (please read out deposit policy),
- The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Jurnal Optimasi Sistem Industri).
5. Co-Authorship
If the article was jointly prepared by more than one author, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. FIJ will not be held liable for anything that may arise due to the author(s) internal dispute. FIJ will only communicate with the corresponding author.
6. Royalties
Being an open accessed journal and disseminating articles for free under the Creative Commons license term mentioned, author(s) aware that FIJ entitles the author(s) to no royalties or other fees.Â
7. Miscellaneous
FIJ will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. FIJ's editors may modify the article to a style of punctuation, spelling, capitalization, referencing, and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.








