Here you can see some skimming of the data.

Not quite a great prediction accuracy as I wanted, let’s see if I can change some hyper parameters to improve the accuracy.
Here I learned that the best score of the entire set I can do is a 91% as well as the parameters to get that. So lets see what it can get me this time.

Interesting, the score was lower using the “better” parameters. This could be due to the first parameters making some bias decisions on the data. Let’s see what kind of score the second parameter gives to the test set.
Not bad, a score of 92.3% predicted correct for the test set. Still I would like to see a higher prediction accuracy, since this is dealing with people’s health. For future updates I will be.

1. Messing around with the parameters more

2. Adding SHAP plots to see which variables are most important.

3. Seeing if there is any new data because data is always changing and this model could be outdated.

Download python script here: https://github.com/Atowel-data/Heart-Disease-XGBoost

Leave a Reply