Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an automatic way to find tau_max for run_pcmci? #32

Open
shaypal5 opened this issue May 5, 2019 · 4 comments
Open

Is there an automatic way to find tau_max for run_pcmci? #32

shaypal5 opened this issue May 5, 2019 · 4 comments

Comments

@shaypal5
Copy link
Contributor

shaypal5 commented May 5, 2019

Hey there,

This is not a problem with the package, rather a question regarding a possible improvement. As the title states, I was wondering whether there is an automatic way to find tau_max for run_pcmci?

In the tutorial, you plotted the lagged unconditional dependencies (the lagged correlations) and chose the lag after which dependencies decay:
image

Based on that, I thought a possible way to automate it is to find, for each such series of correlation vs lag, the lag for which the correlation is close enough to 0 (it is in [-Ɛ, Ɛ]), and take the max out of those (and so Ɛ is a parameter with which the user can control the level of decay required, but which can have a nice default like 0.1 or 0.05).

What do you think? If it's a silly idea, I'd love to know that as well, and also get help thinking of a correct method to do this. :)

Cheers,
Shay

@shaypal5 shaypal5 changed the title Is there an automatic way to find tau_max in run_pcmci? Is there an automatic way to find tau_max for run_pcmci? May 5, 2019
@shaypal5
Copy link
Contributor Author

shaypal5 commented May 6, 2019

This how I ended up implementing the above approach to semi-automatically selecting the max_lag parameter (semi-automatically because epsilon is required as input):

DEF_CORRELATION_EPSILON = 0.25

def series_max_lag(arr, epsilon=None):
    if epsilon is None:
        epsilon = DEF_CORRELATION_EPSILON
    in_epsilon_range = (-epsilon < arr) & (arr < epsilon)
    if any(in_epsilon_range):
        return np.argmax(in_epsilon_range)
    return len(arr)

def find_max_lag(pcmci, tau_max, epsilon=None):
    correlations = pcmci.get_lagged_dependencies(tau_max=tau_max)
    return np.max(np.apply_along_axis(series_max_lag, 2, correlations, epsilon=epsilon))

@jakobrunge
Copy link
Owner

I will look into this function for the next update. Epsilon might be hard to choose for CMIknn or also GPDC. And you would need to take into account negative dependencies as well...

@shaypal5
Copy link
Contributor Author

Great, thanks! Looking forward to your input, then. :)

@shaypal5
Copy link
Contributor Author

An alternative that is more based on the curvature of the plot rather than a constant - and that will require no parameter - is to find the knee/elbow of each plot (as is often done in hyperparameter tuning of unsupervised tasks with monotonically decreasing fitness score, like choosing k for k-means) and taking the max of those.

There's an algorithm (Kneedle) to do this "algorithmically" (as opposed to manually), and a nice Python implementation here:
https://github.com/arvkevi/kneed

Do you think this might be a valid alternative approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants