Research-article
Pages: 28-33
A Decision Tree Ensemble Approach to Diabetes Prediction using the Framingham Heart Dataset, Exploring the Role of AI-Associated Interventions in Reducing Diabetes-Related Adverse Outcomes Between Men and Women
👤 Authors: Y. Talbert Patricia, Reid Korin, Smith Donald
Abstract:
Objective
Diabetes poses significant public health challenges, with many individuals remaining undiagnosed and at risk of complications. This study aimed to evaluate the ...
Read more
Objective
Diabetes poses significant public health challenges, with many individuals remaining undiagnosed and at risk of complications. This study aimed to evaluate the performance of decision tree ensemble methods for predicting diabetes onset using the Framingham Heart Study Teaching Dataset and to explore sex-specific risk patterns relevant to AI-driven interventions.
Methods
We analyzed data from 11,627 participants, incorporating demographics, vital signs, smoking status, medication use, and laboratory measures. Random Forest classifiers were developed to predict diabetes incidence at approximately 6-year (Period 2) and 12-year (Period 3) follow-ups. Class imbalance was addressed using undersampling, oversampling, and the Synthetic Minority Over-sampling Technique (SMOTE).
Results
The models demonstrated robust performance, achieving an Area Under the Curve (AUC) of 0.856 in Period 2, and moderate predictive ability in Period 3 (AUC = 0.732 in males, 0.786 in females). Key predictors included glucose level, BMI, systolic blood pressure, age, and heart rate. Notably, differences emerged in predictive accuracy between men and women, suggesting potential sex-specific vulnerabilities that merit further study.
Conclusion
Machine learning approaches, particularly Random Forests, show promise for medium- and long-term diabetes risk prediction, supporting early identification and intervention efforts. Future work should focus on hyperparameter tuning and explainability techniques, such as SHapley Additive exPlanations (SHAP) values, to improve model precision, interpretability, and fairness. Equity-focused strategies remain critical to ensure AI-driven tools benefit diverse populations and do not exacerbate existing disparities in diabetes care.
Show less
Abstract:
Objective
Diabetes poses significant public health challenges, with many individuals remaining undiagnosed and at risk o...
Read more
Objective
Diabetes poses significant public health challenges, with many individuals remaining undiagnosed and at risk of complications. This study aimed to evaluate the performance of decision tree ensemble methods for predicting diabetes onset using the Framingham Heart Study Teaching Dataset and to explore sex-specific risk patterns relevant to AI-driven interventions.
Methods
We analyzed data from 11,627 participants, incorporating demographics, vital signs, smoking status, medication use, and laboratory measures. Random Forest classifiers were developed to predict diabetes incidence at approximately 6-year (Period 2) and 12-year (Period 3) follow-ups. Class imbalance was addressed using undersampling, oversampling, and the Synthetic Minority Over-sampling Technique (SMOTE).
Results
The models demonstrated robust performance, achieving an Area Under the Curve (AUC) of 0.856 in Period 2, and moderate predictive ability in Period 3 (AUC = 0.732 in males, 0.786 in females). Key predictors included glucose level, BMI, systolic blood pressure, age, and heart rate. Notably, differences emerged in predictive accuracy between men and women, suggesting potential sex-specific vulnerabilities that merit further study.
Conclusion
Machine learning approaches, particularly Random Forests, show promise for medium- and long-term diabetes risk prediction, supporting early identification and intervention efforts. Future work should focus on hyperparameter tuning and explainability techniques, such as SHapley Additive exPlanations (SHAP) values, to improve model precision, interpretability, and fairness. Equity-focused strategies remain critical to ensure AI-driven tools benefit diverse populations and do not exacerbate existing disparities in diabetes care.
Show less
Published:
Dec 29, 2025
Pages:
28-33
👁️
Views:
370
📥
Downloads:
379
(PDF: 193, XML: 186)
Open Access
✨ Recently Published