-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there an automatic way to find tau_max for run_pcmci? #32
Comments
tau_max
in run_pcmci
?
This how I ended up implementing the above approach to semi-automatically selecting the DEF_CORRELATION_EPSILON = 0.25
def series_max_lag(arr, epsilon=None):
if epsilon is None:
epsilon = DEF_CORRELATION_EPSILON
in_epsilon_range = (-epsilon < arr) & (arr < epsilon)
if any(in_epsilon_range):
return np.argmax(in_epsilon_range)
return len(arr)
def find_max_lag(pcmci, tau_max, epsilon=None):
correlations = pcmci.get_lagged_dependencies(tau_max=tau_max)
return np.max(np.apply_along_axis(series_max_lag, 2, correlations, epsilon=epsilon)) |
I will look into this function for the next update. Epsilon might be hard to choose for CMIknn or also GPDC. And you would need to take into account negative dependencies as well... |
Great, thanks! Looking forward to your input, then. :) |
An alternative that is more based on the curvature of the plot rather than a constant - and that will require no parameter - is to find the knee/elbow of each plot (as is often done in hyperparameter tuning of unsupervised tasks with monotonically decreasing fitness score, like choosing k for k-means) and taking the max of those. There's an algorithm (Kneedle) to do this "algorithmically" (as opposed to manually), and a nice Python implementation here: Do you think this might be a valid alternative approach? |
Hey there,
This is not a problem with the package, rather a question regarding a possible improvement. As the title states, I was wondering whether there is an automatic way to find
tau_max
forrun_pcmci
?In the tutorial, you plotted the lagged unconditional dependencies (the lagged correlations) and chose the lag after which dependencies decay:
Based on that, I thought a possible way to automate it is to find, for each such series of correlation vs lag, the lag for which the correlation is close enough to 0 (it is in [-Ɛ, Ɛ]), and take the max out of those (and so Ɛ is a parameter with which the user can control the level of decay required, but which can have a nice default like 0.1 or 0.05).
What do you think? If it's a silly idea, I'd love to know that as well, and also get help thinking of a correct method to do this. :)
Cheers,
Shay
The text was updated successfully, but these errors were encountered: