Vol. 20, No. 2, February 28, 2026
10.3837/tiis.2026.02.004,
Download Paper (Free):
Abstract
Non-communicable diseases (NCDs), such as cardiovascular diseases, cancers, chronic respiratory diseases, and diabetes, are responsible for 71% of global deaths. Early prediction of these diseases is critical for effective management and cost reduction in healthcare. However, early prediction with EHR data is difficult because the records are sparse, sequential, and heavily imbalanced. Traditional models do not handle these issues well, especially when the data represents single events, which reduces predictive performance. This study presents a data augmentation method that integrates the sliding-window approach with Borderline-SMOTE to address these challenges, using MIMIC-IV as a proof-of-concept evaluation. The method converts sparse records into sequential inputs and balances the class distribution, aiming to improve early prediction accuracy. By improving the quality of sequential data inputs, the proposed technique facilitates timely interventions and better patient outcomes. The research assesses the performance of different machine learning models trained on augmented datasets including LSTM, GRU, Random Forest, XGBoost and KNN, with XGBoost achieving the best performance, reaching 94.6% accuracy, precision, recall, and F1-score. These results highlight the effectiveness of the proposed approach in the early prediction of NCDs.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
T. Y. Ooi, J. K. Chaw, H. S. Gan, T. T. Ting, A. O. Salau, A. Ali, "Dynamic Data Augmentation for Early Prediction of Non-Communicable Diseases with Sequence Structure," KSII Transactions on Internet and Information Systems, vol. 20, no. 2, pp. 685-714, 2026. DOI: 10.3837/tiis.2026.02.004.
[ACM Style]
Tze Yaang Ooi, Jun Kit Chaw, Hong Seng Gan, Tin Tin Ting, Ayodeji Olalekan Salau, and Aitizaz Ali. 2026. Dynamic Data Augmentation for Early Prediction of Non-Communicable Diseases with Sequence Structure. KSII Transactions on Internet and Information Systems, 20, 2, (2026), 685-714. DOI: 10.3837/tiis.2026.02.004.
[BibTeX Style]
@article{tiis:105891, title="Dynamic Data Augmentation for Early Prediction of Non-Communicable Diseases with Sequence Structure", author="Tze Yaang Ooi and Jun Kit Chaw and Hong Seng Gan and Tin Tin Ting and Ayodeji Olalekan Salau and Aitizaz Ali and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2026.02.004}, volume={20}, number={2}, year="2026", month={February}, pages={685-714}}