Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph summary results #6

Open
kallimathios opened this issue Jun 20, 2024 · 2 comments
Open

Graph summary results #6

kallimathios opened this issue Jun 20, 2024 · 2 comments

Comments

@kallimathios
Copy link

I receive different numbers for the total number of triples when utilizing the graph summary feature of the graph explorer. I receive different results when I rebuild the graph and duplicate my actions without any changes to the group or environment, and I also receive different results when I restart the environment with a hard refresh. Additionally, results seem to vary when I navigate between groups within an environment. The below examples cover these scenarios.

The following example comes from running the summary in the Development environment for the "All" group. I received two different results without navigating to another group or restarting the environment:
Screenshot 2024-06-20 at 1 05 49 PM

Screenshot 2024-06-20 at 1 07 16 PM

I then tried restarting the environment, and received another different set of results:
Screenshot 2024-06-20 at 1 12 29 PM

While I did not get a screenshot, at one point the system returned 643,169 triples for the "All" group in Development.

I restarted the environment with a hard refresh and generated a summary for the Stage environment and All groups, with the following triples returned:
Screenshot 2024-06-20 at 1 16 51 PM

I then navigated to the next group, California State University, and built the graph within the Stage environment, then went back to the All group, I received these numbers:
Screenshot 2024-06-20 at 1 22 13 PM

I restarted the environment with a hard refresh and tried to generate a graph summary in the Production environment for the "All" group. I received the following two results without navigating to any other groups or restarting the environment.
Screenshot 2024-06-20 at 1 39 32 PM

Screenshot 2024-06-20 at 1 40 48 PM
@jermnelson
Copy link
Contributor

jermnelson commented Jun 24, 2024

Thanks @kallimathios for the detailed ticket! When investigating this issue late last week, at least a partial cause of the different numbers when loading the same graph, comes down to the presence of ordered rdf:List used for ordering triples in the Resource Templates. A short synopsis of how rdf:Lists are implemented as a series of blank nodes that together with the rdf:first and rdf:rest predicates, generate an ordered list.

These rdf:List intermediary blank-nodes are not being skolemized correctly with deterministic URLs but each time the same RDF resource is loaded, these blank-nodes identifiers are being randomly generated by the python rdflib library and show up as new triples. You can replicate this happening by just loading a single URL of a resource template in the Graph Explorer (this example is using PCC Template https://api.development.sinopia.io/resource/pcc:bf2:Serial:Work).

Screenshot 2024-06-24 at 1 11 58 PM

Doing an initial load in graph explorer results in the following statistics:

Screenshot 2024-06-24 at 1 12 32 PM

We then can run a couple of queries to see how many triples contain rdf:first and rdf:rest
Screenshot 2024-06-24 at 1 14 35 PM
Screenshot 2024-06-24 at 1 15 32 PM

Now, if we click the Build button again for the same resource we see the number of triples increased to 477 from 422:
Screenshot 2024-06-24 at 1 17 35 PM

Re-run the SPARQL queries to see how many triples contain rdf:first and rdf:rest:
Screenshot 2024-06-24 at 1 18 10 PM

Screenshot 2024-06-24 at 1 20 54 PM

Taking a closer look at the rdf:first list of subjects and objects, you can see the actual blank-nodes (i.e. https://api.development.sinopia.io/resource/pcc:bf2:Serial:Work#b43) have duplicate subjects for the same rdf:first predicates.

I think a short-term fix is to just create a new graph every time the Build is clicked instead of trying to load the resource into the same graph. However, we will still need to address this problem as part of ticket 2.

@kallimathios
Copy link
Author

Got it - this is super helpful. I will rebuild the graph each time. Also a needed reminder about the functionality to load and investigate a single resource. Thanks so much, @jermnelson !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants