-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Component values revisited #47
Comments
@herrstucki I'm very interested in slow queries, what would help me is to have a list of generated SPARQL queries on our datasets that run slow so I can see the query plan & talk to the triplstore vendor for optimizing it. |
@ktk See e.g. the query I linked in my first message http://yasgui.org/short/EyeZfAUrv … when you remove the |
@ktk For reference, this is a dataset with LOTS of observations which seems to slow down the |
The above query is ~10x faster if the labels are removed. It's still pretty slow though (~1s). |
Currently this is solved with a DISTINCT query, which does need to query all Observations to be sure that there is not one more other kind of 'value'. So it will take longer (linearely) for bigger datasets. (It can potentially be optimised by first getting the distinct URI's and afterwards get the labels for the URI's, as the URI's have a more performant index normally.) But overall this is should be solved by adding the 'shape' from the new cube description, which simply defines the possible values explicitly for ordinal dimensions. At least as long as we do not have any filters set. The filters need to scan anyway, but potentially a smaller part of the dataset. |
Follow-up to #37 and #38.
It turns out that in practice using
componentsValues()
is ~2x slower than fetchingcomponentValues
for each dimension in parallel 😅I'm not sure if there's a solution to this on the level of this library (optimizing the generated query), or if it depends how the triple store is set up (possible to index this query?), specify the dimension values explicitly as rdfs:range …
Currently, we're using this functionality for three things:
As said, for 1. I hope that this won't be needed at all in the future. For 3. I realize that
componentsValues
or fetching all values up-front is probably overkill and we can should just usecomponentValues
when we need it (ie when the UI is shown).👉 For 2. (and 1.) I think it would really be useful to be able to specify a
limit
on componentValues/componentsValues to avoid over-fetching because for these cases we really only need one value. I'm not sure about ordering/sampling because using anything other thanLIMIT
results in a much slower query (e.g. see http://yasgui.org/short/EyeZfAUrv)The text was updated successfully, but these errors were encountered: