Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Claim 1.1 HS improves predictive performance of DT (Fig. 4) #1

Open
do8572 opened this issue Dec 24, 2022 · 0 comments
Open

Claim 1.1 HS improves predictive performance of DT (Fig. 4) #1

do8572 opened this issue Dec 24, 2022 · 0 comments
Assignees

Comments

@do8572
Copy link
Owner

do8572 commented Dec 24, 2022

The prediction performance results for classification and regression are plotted in Fig 4A and Fig 4B respectively, with the number of leaves, a measure of model complexity, plotted on the x-axis. We consider trees grown using four different techniques: CART, CART with cost-complexity pruning (CCP), C4.5 (Quinlan, 2014), and GOSDT (Lin et al., 2020), a method that grows optimal trees in terms of the cost-complexity penalized misclassification loss. To reduce clutter, we only display the classification results for CART and CART with CCP in Fig 4A/B and defer the results for C4.5 (Fig S3) and GOSDT (Appendix S4.2) to the appendix.

For each of the four tree-growing methods, we grow a tree to a fixed number of leaves m,6 for several different choices of m ∈ {2, 4, 8, 12, 15, 20, 24, 28, 30, 32} (in practice, m would be pre-specified by a user or selected via cross-validation). For each tree, we compute its prediction performance before and after applying HS, where the regularization parameter for HS is selected from the set λ ∈ {0.1, 1.0, 10.0, 25.0, 50.0, 100.0} via cross-validation. Results for each experiment are averaged over 10 random data splits. We observe that HS (solid lines in Fig 4A,B) does not hurt prediction in any of our data sets, and often leads to substantial performance gains. For example, taking m = 15, we observe an average increase in relative predictive performance (measured by AUC) of 6.2%, 6.5%, 8% for HS applied to CART, CART with CCP, and C4.5 respectively for the classification data sets. For the regression data sets with m = 15, we observe an average relative increase in R2 performance of 9.8% and 10.1% for CART and CART with CCP respectively.

@do8572 do8572 changed the title Claim 1.1 Claim 1.1 HS improves predictive performance (Fig. 4) Dec 24, 2022
@do8572 do8572 self-assigned this Dec 24, 2022
@do8572 do8572 changed the title Claim 1.1 HS improves predictive performance (Fig. 4) Claim 1.1 HS improves predictive performance of DT (Fig. 4) Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant