Is there an automatic way to find tau_max for run_pcmci? #32

shaypal5 · 2019-05-05T08:58:21Z

Hey there,

This is not a problem with the package, rather a question regarding a possible improvement. As the title states, I was wondering whether there is an automatic way to find tau_max for run_pcmci?

In the tutorial, you plotted the lagged unconditional dependencies (the lagged correlations) and chose the lag after which dependencies decay:

Based on that, I thought a possible way to automate it is to find, for each such series of correlation vs lag, the lag for which the correlation is close enough to 0 (it is in [-Ɛ, Ɛ]), and take the max out of those (and so Ɛ is a parameter with which the user can control the level of decay required, but which can have a nice default like 0.1 or 0.05).

What do you think? If it's a silly idea, I'd love to know that as well, and also get help thinking of a correct method to do this. :)

Cheers,
Shay

The text was updated successfully, but these errors were encountered:

shaypal5 · 2019-05-06T08:41:13Z

This how I ended up implementing the above approach to semi-automatically selecting the max_lag parameter (semi-automatically because epsilon is required as input):

DEF_CORRELATION_EPSILON = 0.25

def series_max_lag(arr, epsilon=None):
    if epsilon is None:
        epsilon = DEF_CORRELATION_EPSILON
    in_epsilon_range = (-epsilon < arr) & (arr < epsilon)
    if any(in_epsilon_range):
        return np.argmax(in_epsilon_range)
    return len(arr)

def find_max_lag(pcmci, tau_max, epsilon=None):
    correlations = pcmci.get_lagged_dependencies(tau_max=tau_max)
    return np.max(np.apply_along_axis(series_max_lag, 2, correlations, epsilon=epsilon))

jakobrunge · 2019-05-14T17:07:58Z

I will look into this function for the next update. Epsilon might be hard to choose for CMIknn or also GPDC. And you would need to take into account negative dependencies as well...

shaypal5 · 2019-05-15T08:24:21Z

Great, thanks! Looking forward to your input, then. :)

shaypal5 · 2019-05-21T14:50:59Z

An alternative that is more based on the curvature of the plot rather than a constant - and that will require no parameter - is to find the knee/elbow of each plot (as is often done in hyperparameter tuning of unsupervised tasks with monotonically decreasing fitness score, like choosing k for k-means) and taking the max of those.

There's an algorithm (Kneedle) to do this "algorithmically" (as opposed to manually), and a nice Python implementation here:
https://github.com/arvkevi/kneed

Do you think this might be a valid alternative approach?

shaypal5 changed the title ~~Is there an automatic way to find tau_max in run_pcmci?~~ Is there an automatic way to find tau_max for run_pcmci? May 5, 2019

shaypal5 mentioned this issue Jun 2, 2019

Which links to select for get_lagged_dependencies? (Warning: Link specified in selected links that is outside the scope of the selected variables) #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an automatic way to find tau_max for run_pcmci? #32

Is there an automatic way to find tau_max for run_pcmci? #32

shaypal5 commented May 5, 2019 •

edited

Loading

shaypal5 commented May 6, 2019

jakobrunge commented May 14, 2019

shaypal5 commented May 15, 2019

shaypal5 commented May 21, 2019

Is there an automatic way to find tau_max for run_pcmci? #32

Is there an automatic way to find tau_max for run_pcmci? #32

Comments

shaypal5 commented May 5, 2019 • edited Loading

shaypal5 commented May 6, 2019

jakobrunge commented May 14, 2019

shaypal5 commented May 15, 2019

shaypal5 commented May 21, 2019

shaypal5 commented May 5, 2019 •

edited

Loading