Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stub Records Identifier with Envelope #722

Open
jeannekitchens opened this issue Sep 9, 2024 · 3 comments
Open

Stub Records Identifier with Envelope #722

jeannekitchens opened this issue Sep 9, 2024 · 3 comments
Assignees

Comments

@jeannekitchens
Copy link

jeannekitchens commented Sep 9, 2024

THIS IS TO INITITATE DISCUSSION AND TO IDENTIFY THE BEST TECHNICAL SOLUTION.

We're going to support use cases for stub records. These are Registry resources that do not meet minimum data policies. We will support publishing these records for the purpose of having CTIDs. It is going to be essential for data consuming to default to complete resources, i.e., the current situation with the Registry where all resources meet minimum data policies AND have an option to consume stub records. Thereby, data consumers need to differentiate stub records and can consume them and/or complete resources.

The envelope needs to identify stub records.

The publishing part will be handled via API similarly to how it's handled now but the stub record API will allow for less data and still require CTIDs.

https://docs.google.com/document/d/12ML6e9psffBjW2d7mnLHzYUnElXBoPxkxNzrGMX3bW8/edit

@siuc-nate
Copy link
Collaborator

siuc-nate commented Sep 9, 2024

I believe the secondary source publishing mechanism we previously developed was designed to handle this kind of use case.

Recall that it started with needing to augment data that was in Collections:
Primary and Secondary Source data with Collections

All we would need to do is have a different minimum data policy for the secondary source records vs the primary source records, and I think the rest is already handled (in terms of the Registry anyway - our side of the secondary-source record implementation would need some work).

The registry already allows filtering search results based on whether the record is primary-source or secondary-source. We could have some other marker in the envelope that indicates whether that record meets our minimum data policy (for non-stub records) that could be used for additional filtering. That could be another enum like we did with primary/secondary source records so we can expand it in the future if needed.

For example:

{
  "envelope_ceterms_ctid": "ce-abcdef...",
  "other envelope properties": "...",
  "resource_publish_type": "secondary",
  "minimum_data_type": "stub"
}

I could see "stub" and "minimum" as two common values, and potentially "benchmark" as a third, though it's unlikely that a resource would meet every benchmark property for a given class.

An opportunity to expand on this would involve having Credential Engine maintain minimal records for resources that, today, are stored as bnodes in credentials, for the purpose of having a record that is considered the main record for that resource (in other words, contains the "real" CTID for that resource). Such records would go into the control of the appropriate organization if/when that organization joins Credential Engine. If an organization decides to stop participating with the Registry (or ceases to exist), then its records would also enter Credential Engine's direct control/maintenance, so that they would continue to exist for the sake of any other data that may reference them. I diagrammed a few such use cases (and others) here:
Stub and Secondary Source Records

@siuc-nate
Copy link
Collaborator

siuc-nate commented Sep 9, 2024

Some other alternatives to the above that I can think of might involve:

  • Creating a Registry community specifically for secondary/stub records, or
  • Creating a CTDL class with the explicit purpose of being a proxy for a bnode, which gets around the minimum data policy problem and quite possibly the CTID problem as well (since its CTID would be for the wrapper rather than for the underlying thing). This would function sort of like a free-floating pathway component would, where its purpose is to facilitate assertions about some underlying thing without directly controlling the underlying thing. Its bnode could use "sameAs" to point to the "real" resource in the registry if one is available.

The second one, illustrated:

{
  "@id": "https://credentialengineregistry.org/graph/ce-abcdef...",
  "@graph": [
    {
      "@id": "https://credentialengineregistry.org/resources/ce-abcdef...",
      "@type": "ceterms:ResourceWrapper",
      "ceterms:ctid": "ce-abcdef...",
      "ceterms:proxyFor": [
        "_:12345"
      ]
    },
    {
      "@id": "_:12345",
      "@type": "ceterms:Organization",
      "ceterms:name": { "en": "Some org with minimal data" },
      "ceterms:subjectWebpage": "https://example.com/12345",
      "owl:sameAs": "https://credentialengineregistry.org/resources/ce-09876..."
    }
  ]
}

@siuc-nate
Copy link
Collaborator

Yesterday, @jeannekitchens had suggested a possible solution that I think would work (and happens to align well with the large diagram in my earlier post) - basically, CE would treat itself as a trusted third-party publisher and use the existing implementation to create organizations and their various resources. Then it's just a matter of:

  • Altering our policy to allow stub-level records to exist in the registry
  • Coming up with a way to mark those records as stub records (I suggested a possible solution for that in my earlier post)
  • Filtering those records out of the search by default (this would likely involve a filtering mechanism similar to the existing search:recordPublishedBy and search:recordPublishType approach that allows filtering based on the envelope - maybe something like search:minimumDataType)

For example, to get all Certificates that meet the minimum data policy or the benchmark policy (if we implement that), nothing about the query would be different from how it is today:

{
  "@type": "ceterms:Certificate"
}

However, if you wanted to also include the stub records, you'd need to specify all of the levels you want:

{
  "@type": "ceterms:Certificate",
  "search:minimumDataType": [ "stub", "minimum", "benchmark" ]
}

For example, to get only stub Certificates:

{
  "@type": "ceterms:Certificate",
  "search:minimumDataType": "stub"
}

Or to get only benchmark Certificates:

{
  "@type": "ceterms:Certificate",
  "search:minimumDataType": "benchmark"
}

I am a little hesitant about adding anything other than "stub" and "minimum" since it might make data maintenance and querying more of a headache than it's worth, but I'm curious what others think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants