Skip to content

Commit

Permalink
Penalty on shap calculation for higher variance (#218)
Browse files Browse the repository at this point in the history
PR addresses #216 

**Overall objective:** Add penalty to features that has high variance in
underlying shap values - when computing feature importance. This will
(in theory) encourage selection of features that have more coherency
across CV folds.

**API design:**
- Single parameter `shap_variance_penalty_factor` used to control
enabling/disabling in addition to controlling the amount of penalty
added to the mean(shap) computation.
- Formula: `penalized_shap_abs_mean = (mean_shap - (std_shap *
shap_variance_penalty_factor))`
- `shap_variance_penalty_factor` = None (default docstring)
- `shap_variance_penalty_factor`= 0 (same as None, no penalty)
- `shap_variance_penalty_factor`= 1 (one standard deviation of mean
penalty)

**Work tasks:**
- [x] Add `shap_variance_penalty_factor` parameter to `ShapRFECV.fit`,
`compute` & `fit_compute`
- [x] Update `EarlyStoppingShapRFECV.fit`, `compute` & `fit_compute`
- [x] Add `shap_variance_penalty_factor` to
`shap_helpers.calculate_shap_importance`
- [x] Update `ShapModelInterpreter.fit`, `compute` & `fit_compute`
- [x] Add to doc strings throughout
- [x] Run simulation demonstrating impact of
`shap_variance_penalty_factor` feature
- [x] Add unit tests
- [x] Add tutorial comparing RFECV methods and results as jupyter
notebook

**Reviewers:** LMK any changes / improvements that can be made.

---------

Co-authored-by: Reinier Koops <info@reinier.work>
  • Loading branch information
markdregan and Reinier Koops authored Jul 6, 2023
1 parent bec8f89 commit 4774281
Show file tree
Hide file tree
Showing 5 changed files with 1,586 additions and 9 deletions.
Loading

0 comments on commit 4774281

Please sign in to comment.