-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically drop unneeded columns in choosers table #833
Conversation
Note that when dropping columns in pandas df, the memory actually goes up momentarily before it goes down. see discussion here pandas-dev/pandas#17092, this issue was "closed" but it did not get resolved. Looking at my code changes last week, I realized I was also dropping the columns a bit too late for interaction simulate models. The columns should be dropped before interaction df is created. I then made those changes. Below shows the 1-zone benchmarking runs with main branch (top) and with this PR (bottom). In the non-sharrow mode, the peak memory went down from 352 GB to 261 GB, a 25% reduction. Component-wise memory reduction, sorted by memory saving: I'm still dealing with some crashes in the sharrow mode. |
If you read the thread closely, you'll note that that "this is not going to be solved in pandas 1." The behind-the-curtain memory management of pandas 1.x doesn't offer any way to drop unused columns without consuming extra RAM. This is alleviated by moving to pandas 2... after the transition, there are now ways to get the memory benefit. This is (part of) the reason I've worked this past week to try to get us pandas-2 compatible. |
I have some updates for the Sharrow runs. Below are three 1-zone benchmarking runs with Sharrow. The one on the top (3/20) used the
|
Update RE:
This seems to be because flow.load time increased. For example, in the 3/20 run: 18/03/2024 19:27:38.294 - INFO - sharrow - flow exists in library: BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2
18/03/2024 19:27:38.294 - INFO - activitysim.core.flow - completed setting up flow school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums in 0:00:00.077000
18/03/2024 19:27:38.294 - INFO - activitysim.core.flow - begin flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
18/03/2024 19:30:05.321 - INFO - activitysim.core.flow - completed flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load in 0:02:27.026987 school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
18/03/2024 19:30:05.321 - INFO - activitysim.core.flow - completed apply_flow in 0:02:27.103988
...
18/03/2024 20:09:15.654 - INFO - activitysim.core.flow - setting up sharrow flow vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils
18/03/2024 20:09:15.995 - INFO - sharrow - using existing flow code 736GLJ6FZR4AF6ZTRFF5NB4F2D2MYM7X
18/03/2024 20:09:16.193 - INFO - activitysim.core.flow - completed setting up flow vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils in 0:00:00.552001
18/03/2024 20:09:16.193 - INFO - activitysim.core.flow - begin flow_736GLJ6FZR4AF6ZTRFF5NB4F2D2MYM7X.load vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils
18/03/2024 20:13:04.941 - INFO - activitysim.core.flow - completed flow_736GLJ6FZR4AF6ZTRFF5NB4F2D2MYM7X.load in 0:03:48.747827 vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils
18/03/2024 20:13:04.941 - INFO - activitysim.core.flow - completed apply_flow in 0:03:49.299828
... In the 3/29 run: 29/03/2024 17:41:59.160 - INFO - sharrow - flow exists in library: BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2
29/03/2024 17:41:59.160 - INFO - activitysim.core.flow - completed setting up flow school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums in 0:00:00.078139
29/03/2024 17:41:59.160 - INFO - activitysim.core.flow - begin flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
29/03/2024 17:53:31.208 - INFO - activitysim.core.flow - completed flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load in 0:11:32.047284 school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
29/03/2024 17:53:31.208 - INFO - activitysim.core.flow - completed apply_flow in 0:11:32.125423
...
29/03/2024 19:11:49.595 - INFO - activitysim.core.flow - setting up sharrow flow vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils
29/03/2024 19:11:49.924 - INFO - sharrow - using existing flow code 736GLJ6FZR4AF6ZTRFF5NB4F2D2MYM7X
29/03/2024 19:11:49.955 - INFO - activitysim.core.flow - completed setting up flow vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils in 0:00:00.390580
29/03/2024 19:11:49.955 - INFO - activitysim.core.flow - begin flow_736GLJ6FZR4AF6ZTRFF5NB4F2D2MYM7X.load vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils
29/03/2024 19:21:27.151 - INFO - activitysim.core.flow - completed flow_736GLJ6FZR4AF6ZTRFF5NB4F2D2MYM7X.load in 0:09:37.196634 vehicle_type_choice.interaction_simulate.interaction_simulate.eval_interaction_utils
29/03/2024 19:21:27.151 - INFO - activitysim.core.flow - completed apply_flow in 0:09:37.587213
... Not sure if this is because of recent sharrow updates, or because of recent updates in the way unused columns are dropped. |
I am not sure what is happening that has cause the code to run slower for @i-am-sijia , I am unable to replicate the problem. I have run the full scale model on my laptop through school location choice (the first problematic model shown above) and got these results: sharrow commit 7fae9f060b77684b2f309c9ad2ad3d2bf3286239 (sharrow as of 3/28)
sharrow commit c560b45f0e1cd5ccbb871319b5ccfbcf789debaf (sharrow v2.7, in use 3/20)
|
I recall we discussed that we would like to have a method to turn this feature off for individual components, either because it is interfering with something (e.g. tracing, estimation mode) or just because it's not working on a particular component (probably due to something weird in the spec). I do see that it turns itself off when tracing or estimation mode is used, which is OK, but I think we still want the capability to turn it off manually if desired. |
I reran the reported "3/20" and "3/29" runs, see Test 1 and Test 2 below. Last time they were run on different machines. This time they are run on the same machine. Good news is that this time I'm not seeing a huge runtime difference. Test 3 uses the same commits as one of Jeff's test, apparently the run time on Win is longer than Mac. Test 1 (Rerunning the "3/20" run)Sharrow commit c560b45f0e1cd5ccbb871319b5ccfbcf789debaf (sharrow v2.7) 16/04/2024 13:41:49.585 - INFO - sharrow - flow exists in library: BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2
16/04/2024 13:41:49.587 - INFO - activitysim.core.flow - completed setting up flow school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums in 0:00:00.114174
16/04/2024 13:41:49.587 - INFO - activitysim.core.flow - begin flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
16/04/2024 13:49:11.177 - INFO - activitysim.core.flow - completed flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load in 0:07:21.590254 school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
16/04/2024 13:49:11.177 - INFO - activitysim.core.flow - completed apply_flow in 0:07:21.704428 Test 2 (Rerunning the "3/29" run)Sharrow commit c560b45f0e1cd5ccbb871319b5ccfbcf789debaf (sharrow v2.7) 12/04/2024 20:47:53.356 - INFO - sharrow - flow exists in library: BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2
12/04/2024 20:47:53.356 - INFO - activitysim.core.flow - completed setting up flow school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums in 0:00:00.078128
12/04/2024 20:47:53.356 - INFO - activitysim.core.flow - begin flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
12/04/2024 20:56:44.189 - INFO - activitysim.core.flow - completed flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load in 0:08:50.832949 school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
12/04/2024 20:56:44.189 - INFO - activitysim.core.flow - completed apply_flow in 0:08:50.911078 Test 3 (use the latest sharrow)Sharrow commit 7fae9f060b77684b2f309c9ad2ad3d2bf3286239 (sharrow v2.8.2 as of 3/28) 15/04/2024 13:55:03.917 - INFO - sharrow - flow exists in library: BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2
15/04/2024 13:55:03.917 - INFO - activitysim.core.flow - completed setting up flow school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums in 0:00:00.082882
15/04/2024 13:55:03.917 - INFO - activitysim.core.flow - begin flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
15/04/2024 14:01:53.135 - INFO - activitysim.core.flow - completed flow_BJFBKLAOJGCDYZF4MKEWTG744CT3JZF2.load in 0:06:49.217218 school_location.i1.logsums.gradeschool.compute_logsums.eval_nl_logsums
15/04/2024 14:01:53.135 - INFO - activitysim.core.flow - completed apply_flow in 0:06:49.300100 |
@jpn-- , I made the key code changes needed for turning this feature on and off for individual components, I haven't pushed them. It's essentially passing a Boolean Then I paused and read your latest comment on #824. Since you are generalizing the |
I like this idea. Let's finish the review/merge of #824 and then it should be easy to update this PR and put this straight into the |
@jpn-- , I implemented I tested setting |
This PR addresses #792