Stub Records Identifier with Envelope #722

jeannekitchens · 2024-09-09T14:45:47Z

THIS IS TO INITITATE DISCUSSION AND TO IDENTIFY THE BEST TECHNICAL SOLUTION.

We're going to support use cases for stub records. These are Registry resources that do not meet minimum data policies. We will support publishing these records for the purpose of having CTIDs. It is going to be essential for data consuming to default to complete resources, i.e., the current situation with the Registry where all resources meet minimum data policies AND have an option to consume stub records. Thereby, data consumers need to differentiate stub records and can consume them and/or complete resources.

The envelope needs to identify stub records.

The publishing part will be handled via API similarly to how it's handled now but the stub record API will allow for less data and still require CTIDs.

https://docs.google.com/document/d/12ML6e9psffBjW2d7mnLHzYUnElXBoPxkxNzrGMX3bW8/edit

siuc-nate · 2024-09-09T15:38:35Z

I believe the secondary source publishing mechanism we previously developed was designed to handle this kind of use case.

Recall that it started with needing to augment data that was in Collections:

All we would need to do is have a different minimum data policy for the secondary source records vs the primary source records, and I think the rest is already handled (in terms of the Registry anyway - our side of the secondary-source record implementation would need some work).

The registry already allows filtering search results based on whether the record is primary-source or secondary-source. We could have some other marker in the envelope that indicates whether that record meets our minimum data policy (for non-stub records) that could be used for additional filtering. That could be another enum like we did with primary/secondary source records so we can expand it in the future if needed.

For example:

{
  "envelope_ceterms_ctid": "ce-abcdef...",
  "other envelope properties": "...",
  "resource_publish_type": "secondary",
  "minimum_data_type": "stub"
}

I could see "stub" and "minimum" as two common values, and potentially "benchmark" as a third, though it's unlikely that a resource would meet every benchmark property for a given class.

An opportunity to expand on this would involve having Credential Engine maintain minimal records for resources that, today, are stored as bnodes in credentials, for the purpose of having a record that is considered the main record for that resource (in other words, contains the "real" CTID for that resource). Such records would go into the control of the appropriate organization if/when that organization joins Credential Engine. If an organization decides to stop participating with the Registry (or ceases to exist), then its records would also enter Credential Engine's direct control/maintenance, so that they would continue to exist for the sake of any other data that may reference them. I diagrammed a few such use cases (and others) here:

siuc-nate · 2024-09-09T15:41:18Z

Some other alternatives to the above that I can think of might involve:

Creating a Registry community specifically for secondary/stub records, or
Creating a CTDL class with the explicit purpose of being a proxy for a bnode, which gets around the minimum data policy problem and quite possibly the CTID problem as well (since its CTID would be for the wrapper rather than for the underlying thing). This would function sort of like a free-floating pathway component would, where its purpose is to facilitate assertions about some underlying thing without directly controlling the underlying thing. Its bnode could use "sameAs" to point to the "real" resource in the registry if one is available.

The second one, illustrated:

{
  "@id": "https://credentialengineregistry.org/graph/ce-abcdef...",
  "@graph": [
    {
      "@id": "https://credentialengineregistry.org/resources/ce-abcdef...",
      "@type": "ceterms:ResourceWrapper",
      "ceterms:ctid": "ce-abcdef...",
      "ceterms:proxyFor": [
        "_:12345"
      ]
    },
    {
      "@id": "_:12345",
      "@type": "ceterms:Organization",
      "ceterms:name": { "en": "Some org with minimal data" },
      "ceterms:subjectWebpage": "https://example.com/12345",
      "owl:sameAs": "https://credentialengineregistry.org/resources/ce-09876..."
    }
  ]
}

siuc-nate · 2024-09-11T15:38:10Z

Yesterday, @jeannekitchens had suggested a possible solution that I think would work (and happens to align well with the large diagram in my earlier post) - basically, CE would treat itself as a trusted third-party publisher and use the existing implementation to create organizations and their various resources. Then it's just a matter of:

Altering our policy to allow stub-level records to exist in the registry
Coming up with a way to mark those records as stub records (I suggested a possible solution for that in my earlier post)
Filtering those records out of the search by default (this would likely involve a filtering mechanism similar to the existing search:recordPublishedBy and search:recordPublishType approach that allows filtering based on the envelope - maybe something like search:minimumDataType)

For example, to get all Certificates that meet the minimum data policy or the benchmark policy (if we implement that), nothing about the query would be different from how it is today:

{
  "@type": "ceterms:Certificate"
}

However, if you wanted to also include the stub records, you'd need to specify all of the levels you want:

{
  "@type": "ceterms:Certificate",
  "search:minimumDataType": [ "stub", "minimum", "benchmark" ]
}

For example, to get only stub Certificates:

{
  "@type": "ceterms:Certificate",
  "search:minimumDataType": "stub"
}

Or to get only benchmark Certificates:

{
  "@type": "ceterms:Certificate",
  "search:minimumDataType": "benchmark"
}

I am a little hesitant about adding anything other than "stub" and "minimum" since it might make data maintenance and querying more of a headache than it's worth, but I'm curious what others think.

jeannekitchens assigned excelsior, edgarf, science and rohit-joy Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stub Records Identifier with Envelope #722

Stub Records Identifier with Envelope #722

jeannekitchens commented Sep 9, 2024 •

edited

Loading

siuc-nate commented Sep 9, 2024 •

edited

Loading

siuc-nate commented Sep 9, 2024 •

edited

Loading

siuc-nate commented Sep 11, 2024

Stub Records Identifier with Envelope #722

Stub Records Identifier with Envelope #722

Comments

jeannekitchens commented Sep 9, 2024 • edited Loading

siuc-nate commented Sep 9, 2024 • edited Loading

siuc-nate commented Sep 9, 2024 • edited Loading

siuc-nate commented Sep 11, 2024

jeannekitchens commented Sep 9, 2024 •

edited

Loading

siuc-nate commented Sep 9, 2024 •

edited

Loading

siuc-nate commented Sep 9, 2024 •

edited

Loading