Explicitly vs. automatically selected dimensions #51

jstcki · 2020-02-04T17:04:02Z

Hi!

I just noticed a discrepancy between how explicitly and automatically selected dimensions are handled, and another aspect which makes the automatic selects less-than-useful.

Labels are only returned for explicitly selected dimensions (oddly, except for years which have an empty label).
Automatic selects can not really be used for anything since they use keys that are derived from the translated dimension label. These slugified keys can not be re-associated with a dimension (which is necessary to get the dimension's label etc.). So eventually, we end up having to manually select all dimensions anyway.

Point 2 could actually neatly be solved by not generating keys from the label but by using the dimension IRI. If behavior in point 1 would be consistent (i.e. labels present for auto-selects), this would actually remove the need to explicitly select dimensions at all.

For example:

// Instead of this
[{ forestZone: {...}, canton: {...}}, ...]
// something like this could be returned
[{ "http://environment.ld.admin.ch/foen/px/0703010000_102/dimension/1": {...}, "http://environment.ld.admin.ch/foen/px/0703010000_102/dimension/2": {...}}, ...]

If IRIs are used as keys, the argument to .select() could be simply an array of components or just their IRIs instead of having to specify binding names myself (which is also dangerous since these are not slugified!).

The text was updated successfully, but these errors were encountered:

vhf · 2020-02-10T09:58:17Z

Hey, thanks!

Missing labels for automatically selected dimensions: will fix!
To me your suggestion makes sense. It will make a few things uglier, for instance:
- .groupBy("raum") -> .groupBy("https://ld.stadt-zuerich.ch/statistics/property/RAUM")
- .filter(({ someDate }) => someDate.not.equals("2019-08-29T07:27:56.241Z")); not possible anymore (no big deal though)

I'll try something and we'll then discuss the details in a PR.

jstcki · 2020-02-10T13:15:04Z

Note that querying for labels on all dimensions makes everything much slower, so I wonder if there would be a better way to do this. E.g. by only querying for labels in cube.dimensions() and then stitching them together with a label-less result from cube.query(). Haven't tried though.

vhf · 2020-02-11T14:37:44Z

Note that querying for labels on all dimensions makes everything much slower

Could you please tell us more about this? Would running datacube.components() to fetch all labels be too costly?

jstcki · 2020-02-11T17:30:15Z

I meant that currently, cube.select(allDimensions).query() is much slower than cube.select([]).query() because selecting dimensions queries for all dimension value labels on each observation.

This is probably related to #47 … adding labels to the query unfortunately makes it much slower.

BTW, we're currently also always setting all potential languages on the entrypoint, e.g. ["de", "fr", "it", "en", ""], because some datasets can be only available in one of these and it's not clear what the fallback should be. Does adding more languages make the query slower? This could probably be optimized if the datasets declared available languages correctly.

vhf · 2020-02-12T14:30:19Z

Yeah adding labels definitely makes things slower, and yes adding more languages makes it even slower.

I think not fetching labels for automatically selected dimensions and using dimensions IRIs as keys would solve most of the issue. Users could fetch dimensions and their labels independently and possibly cache them.

This could probably be optimized if the datasets declared available languages correctly.

@ktk what do you think about this, is it possible to declare the languages somewhere?

jstcki mentioned this issue May 29, 2020

Ideas for new API #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly vs. automatically selected dimensions #51

Explicitly vs. automatically selected dimensions #51

jstcki commented Feb 4, 2020

vhf commented Feb 10, 2020

jstcki commented Feb 10, 2020

vhf commented Feb 11, 2020

jstcki commented Feb 11, 2020

vhf commented Feb 12, 2020 •

edited

Loading

Explicitly vs. automatically selected dimensions #51

Explicitly vs. automatically selected dimensions #51

Comments

jstcki commented Feb 4, 2020

vhf commented Feb 10, 2020

jstcki commented Feb 10, 2020

vhf commented Feb 11, 2020

jstcki commented Feb 11, 2020

vhf commented Feb 12, 2020 • edited Loading

vhf commented Feb 12, 2020 •

edited

Loading