We're using the Cleveland Heart Disease dataset. After removing duplicate rows (the Kaggle version had 1,025 rows but 723 were duplicates!), we're left with 302 clean unique patients. The goal: predict whether each patient has heart disease based on 13 clinical measurements.
Before any learning can happen, the data needs to be cleaned and converted. Decision trees need numbers - so categorical columns get turned into binary yes/no columns using One-Hot Encoding.
Three models are trained on the same 241 rows. The only differences are how they choose which feature to split on at each decision node, and whether branches get pruned afterwards.
Each model is evaluated on the 61 test instances it never saw during training. Four metrics give us the full picture - because accuracy alone hides class-specific failures.
A confusion matrix breaks predictions into four boxes. The most important for heart disease is False Negatives - real disease cases the model missed. These are the dangerous errors because the patient goes home thinking they're fine.
Pruning is the single most impactful thing in this experiment. Without it, CART memorises the 241 training patients including their noise, then struggles on new ones. Pruning removes branches that don't reflect genuine patterns.
After training, the pruned CART model tells us exactly which features it actually used. Remarkably, only 3 features out of 28 do all the work - everything else was pruned away as noise. This makes the model beautifully interpretable.
Here's the complete picture. Each model has its strengths - but for a medical classification task where missing a diagnosis is the worst possible error, CART Pruned wins clearly.