NCFS.fit() causes my computer to crash #30

paulcbogdan · 2022-11-11T15:03:25Z

Thank you very much for making the package. It's a great help. Sadly, it sometimes causes my computer to totally crash, particularly if I'm running another script simultaneously. During a crash, everything goes black, and I then need to restart it using my PSU's on/off switch. My guess is that this is related to parallel processing?

Even when the package doesn't crash, running fit causes this warning to appear:

C:\Users\paulc\Anaconda3\lib\site-packages\ncfs\NCFS.py:125: UserWarning: Data matrix contains values outside of the [0, 1] interval. May be numerical unstable and lead to pseudocount additions during fitting.
  warnings.warn(
C:\Users\paulc\Anaconda3\lib\site-packages\ncfs\accelerated.py:199: NumbaPerformanceWarning: 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "C:\Users\paulc\Anaconda3\lib\site-packages\ncfs\accelerated.py", line 65:
@numba.njit(parallel=True, fastmath=True)
def feature_gradient(

I have no clue how numba works. Do you have any suggestions? I'm fine with the code running slower, so if there is any way for me to turn off the parallelism that would be appreciated.

Thanks again

The text was updated successfully, but these errors were encountered:

dakota-hawkins · 2022-11-11T19:19:36Z

Hi there!

Sorry for the problem, and thanks for raising the issue. How big are the datasets you're working with? The method does produce good feature selection, but can be computationally expensive.

That diagnostic warning is Numba saying there are limited performance gains from setting a function to parallel, but when I benchmarked it empirically there were still significant speedups, so I left the decoration in.

One thing you can try is changing the NUMBA_NUM_THREADS environmental variable to set the number of threads to a lower count so your computer doesn't brick. Theoretically if you set it below how every many threads your other processes are using, you should be okay.

Info on limiting threads: https://numba.pydata.org/numba-doc/latest/user/threading-layer.html#setting-the-number-of-threads

paulcbogdan · 2022-11-13T22:50:53Z

Thanks for the quick and detailed reply. The dataset has 200 examples with 34716 features each. This isn't too large (?), although I think running the NCFS was most likely to crash when I was running other scripts simultaneously, some of which use larger datasets.

I will try setting the numba threads to 1 and running it later this week, once these other scripts finish.

paulcbogdan · 2023-01-18T19:12:03Z

I am now working with a dataset, where each example has roughly 500,000 features and with 200-1000 examples. NCFS works fine with 200 examples. It still causes a crash at 1000 examples, even when setting NUMBA_NUM_THREADS = 1. For 400 examples, NCFS does not finish after 24 hours.

I am not familiar with the implementation details behind NCFS, but do you think the 1000-example case, even if I get it to stop crashing, is simply not computationally feasible on a consumer-level PC (64 GB of RAM and a decent processor)? If so, we can accept this, but before we conclude that it isn't computationally feasible, we would like to know whether we are simply doing something wrong.

Thank you again for making this package.

paulcbogdan closed this as completed Nov 13, 2022

paulcbogdan reopened this Nov 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCFS.fit() causes my computer to crash #30

NCFS.fit() causes my computer to crash #30

paulcbogdan commented Nov 11, 2022 •

edited

Loading

dakota-hawkins commented Nov 11, 2022

paulcbogdan commented Nov 13, 2022 •

edited

Loading

paulcbogdan commented Jan 18, 2023

NCFS.fit() causes my computer to crash #30

NCFS.fit() causes my computer to crash #30

Comments

paulcbogdan commented Nov 11, 2022 • edited Loading

dakota-hawkins commented Nov 11, 2022

paulcbogdan commented Nov 13, 2022 • edited Loading

paulcbogdan commented Jan 18, 2023

paulcbogdan commented Nov 11, 2022 •

edited

Loading

paulcbogdan commented Nov 13, 2022 •

edited

Loading