Cerebral stroke is a big global health problem that causes a lot of death and long-term disability. This shows how important it is to find people who are likely to have a stroke early on so that they can be prevented and treated quickly. The study uses Kaggle's Stroke Prediction Dataset, which has information like gender, age, high blood pressure, heart disease, glucose level, body mass index (BMI), and smoking status to figure out how likely someone is to have a stroke. After label encoding, one-hot encoding, and Random Forest Imputation are used to fix missing BMI values in the data, it is visualized using class distribution charts and correlation matrices. Normalizing the dataset with MinMax, Standard, and Robust scalers is done, and to fix class imbalance, oversampling methods like SMOTE, ADASYN, and Random oversampling are used. A number of machine learning and deep learning models are used to improve performance. These include Logistic Regression, KNearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, Gaussian Naive Bayes, 1D-CNN, GRU, LSTM, ANN, and BiRNN. To make the predictions even better, ensemble learning methods are looked into. A Voting classifier that combines XGBoost, Random Forest, and AdaBoost does the best, with 99.4% accuracy, precision, recall, F1-score, and ROC-AUC. Explainable AI techniques like LIME and SHAP are also used to figure out how important each trait is, and a web-based interface built on the Flask framework and integrating SQLite is created to let users make predictions in real time
Keywords : Stroke prediction, Machine learning, Ensemble learning, Deep learning, Explainable AI, Flask framework, Data preprocessing, Health informatics
Author : R Venkatesh, K Dhanamjay, K Yatheendra
Title : Early Detection of Stroke Risk Through Intelligent Health Data Modeling and Machine Learning
Volume/Issue : 2026;03(04)
Page No : 308-318