-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support query of multiple HDT files from CLI #166
Comments
I think you can achieve that with the Model ModelFactory#createUnion(Model,Model) method, the datasets are usually for named graphs. But if you are using that, the Union implementation is working with only 2 models and by using an HashSet to store seen triples. Chaining multiple Unions using 1 union/hdt might be memory consuming. Edit: An internal method to HDT-CORE would be better (and harder) to implement if you can :) |
It looks like the Apache Jena API DatasetFactory, Dataset.addNamedModel, and Dataset.getUnionModel could be combined as another approach. @ate47 - Do you have any thoughts on what the consequences or efficiency of ModelFactory.createUnion would be versus Dataset.getUnionModel? I can take a look at HDT-CORE as well. @ate47 - do you have a class or function point you could suggest for me to use as a starting point? |
I'm not sure, but from my memories, you need to run store updates in the main dataset to merge the union model, so you need a Jena model because the HDT model can't handle updates and it will be long to load and to manage in memory, but I'm not a expert about this part, so you can try if you want. To learn the internal usage of HDT, I would suggest to read this submission about it and then you can start by the Dictionaries, the default implementation ( |
Querying HDT with SPARQL from the CLI only accepts a single HDT file at a time (https://github.com/rdfhdt/hdt-java/blob/master/hdt-jena/src/main/java/org/rdfhdt/hdtjena/cmd/HDTSparql.java). It would be a useful enhancement if multiple HDT files could be provided, and the query run over the aggregation.
One candidate implementation might use a Jena DatasetFactory for the aggregation, but I have not seen an example of how that might be used. If anyone can post an example of the correct use of Jena for this, then I should be able to implement the feature in HDTSparql.java.
The text was updated successfully, but these errors were encountered: