Support multiple tables / star schema #3

pudo · 2015-12-04T10:32:19Z

Right now, only a single fact table is supported, and aggregations of dimensions will assume that all columns for a given dimension are perfectly denormalised. The application should support JOINs for a star schema instead.

pwalsh · 2015-12-07T09:05:25Z

@akariv first we need to understand if we need this. Assigning to you to investigate as part of general deep dive on babbage.

pudo · 2015-12-07T09:08:14Z

Might be useful to separate between being able to load from multiple resources and keeping multiple tables for dimensions internally. The two are almost certainly a "false friend" equality (i.e. looks like the same thing but in practice turns out not to be one).

pwalsh · 2015-12-07T09:12:20Z

@pudo totally agree. seems to me that single resources in FDP speak could (should?) load into multiple tables (star schema).

I definitely do not want babbage to change to reflect the various modelling strategies in an FDP mapping: rather, that the loader can work with the mapping to write to a consistent (normalised?) babbage model.

I guess you are currently in the best position to advise on a way forward around that.

pudo · 2015-12-07T09:15:18Z

There was an interesting thought from @Stiivi (also relevant to #4): he proposed loading the source data facts into a JSON column on the fact table (i.e. as-is) and then generating the actual columns and star schema as the model is defined through PostgreSQL commands (i.e. inside the DB).

Otherwise you end up in that spot where each time someone changes the label for an attribute, you have to re-load 900 MB of source CSV.

pwalsh · 2015-12-15T07:47:17Z

I'm going to backlog this now. We don't need this now, and while there are great ideas in here, let's focus on our core tasks first.

larjohn · 2016-04-21T21:41:09Z

I am not sure if now is the time to pop this out of the backlog.

We are interested in using such a feature in openbudgets.eu. Other than the backend implementation, in which I can't offer any opinion yet, what is the suggested way to support this API-wise?

Would it be a call to the aggregates/facts etc endpoints without a cube path part?

akariv · 2016-04-22T12:09:00Z

I would say that from an API point of view, nothing needs to change.
The API should reflect to the user a denormalized data table, while using normalized tables in the backend.

larjohn · 2016-04-22T12:25:35Z

Yes, but how do you imagine a call of this service? This call:

http://next.openspending.org/api/3/cubes/405f5e1681d4b6673fbfc0ceb196fe47:boost-armenia-2006-2014/aggregate?drilldown=activity.program&pagesize=30&cut=classification.econ_func%3A%22Economic%22&order=adjusted.sum%3Adesc

already contains the cube name in it. Will "cube-less" queries be supported somehow?

pudo · 2016-04-22T12:32:03Z

Just out of interest: what are the semantics of a cube-less call? I imagine it would be quite hard to properly aggregate budgetary data even with slightly different origins. If we're talking RDF here: Wouldn't it be less dangerous for a data integration tool to actually materialise merged cubes with well-defined semantics? (e.g. "here's the merged cube of all cohesion fund data sub-cubes")

larjohn · 2016-04-22T14:55:37Z

Well a cube-less call is a fictional, ideal call, intended to force us explore alternative solutions.

We are indeed working with RDF at openbudgets.eu, but we do not desire to suggest any direction for openspending based on our expactations from RDF.

This means we don't even care if openspending's loader materialises the merged cubes. In rudolf, we have to tackle the same issues: determine which dimensions are similiar, determine which values of these dimensions are similar (in order to support common cuts) and find a way to get normalized values (mostly currency-related normalization).

Regarding the dimensions: we can either cross our fingers and rely on dimensions inheriting from common parents (also having values linked with some kind of similarity predicate) or go with CONSTRUCTqueries which indeed materialize the datasets into a new cube. Both ways are implementable our side, I was just wondering if, API-wise, there would be some actual super-cube to query in the same way as with the ordinary cubes.

pudo mentioned this issue Dec 4, 2015

Should babbage be able to create it's own tables? #4

Closed

pwalsh added the enhancement label Dec 7, 2015

pwalsh added this to the Current milestone Dec 7, 2015

pwalsh assigned akariv Dec 7, 2015

pwalsh unassigned akariv Dec 15, 2015

pwalsh modified the milestones: Backlog, Current Dec 15, 2015

pwalsh mentioned this issue May 13, 2016

Star tables support #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple tables / star schema #3

Support multiple tables / star schema #3

pudo commented Dec 4, 2015

pwalsh commented Dec 7, 2015

pudo commented Dec 7, 2015

pwalsh commented Dec 7, 2015

pudo commented Dec 7, 2015

pwalsh commented Dec 15, 2015

larjohn commented Apr 21, 2016

akariv commented Apr 22, 2016

larjohn commented Apr 22, 2016

pudo commented Apr 22, 2016

larjohn commented Apr 22, 2016

Support multiple tables / star schema #3

Support multiple tables / star schema #3

Comments

pudo commented Dec 4, 2015

pwalsh commented Dec 7, 2015

pudo commented Dec 7, 2015

pwalsh commented Dec 7, 2015

pudo commented Dec 7, 2015

pwalsh commented Dec 15, 2015

larjohn commented Apr 21, 2016

akariv commented Apr 22, 2016

larjohn commented Apr 22, 2016

pudo commented Apr 22, 2016

larjohn commented Apr 22, 2016