Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Penalty on shap calculation for higher variance (#218)
PR addresses #216 **Overall objective:** Add penalty to features that has high variance in underlying shap values - when computing feature importance. This will (in theory) encourage selection of features that have more coherency across CV folds. **API design:** - Single parameter `shap_variance_penalty_factor` used to control enabling/disabling in addition to controlling the amount of penalty added to the mean(shap) computation. - Formula: `penalized_shap_abs_mean = (mean_shap - (std_shap * shap_variance_penalty_factor))` - `shap_variance_penalty_factor` = None (default docstring) - `shap_variance_penalty_factor`= 0 (same as None, no penalty) - `shap_variance_penalty_factor`= 1 (one standard deviation of mean penalty) **Work tasks:** - [x] Add `shap_variance_penalty_factor` parameter to `ShapRFECV.fit`, `compute` & `fit_compute` - [x] Update `EarlyStoppingShapRFECV.fit`, `compute` & `fit_compute` - [x] Add `shap_variance_penalty_factor` to `shap_helpers.calculate_shap_importance` - [x] Update `ShapModelInterpreter.fit`, `compute` & `fit_compute` - [x] Add to doc strings throughout - [x] Run simulation demonstrating impact of `shap_variance_penalty_factor` feature - [x] Add unit tests - [x] Add tutorial comparing RFECV methods and results as jupyter notebook **Reviewers:** LMK any changes / improvements that can be made. --------- Co-authored-by: Reinier Koops <info@reinier.work>
- Loading branch information