Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for proper language versioning system #42

Open
rjelliffe opened this issue Jul 11, 2022 · 23 comments
Open

Proposal for proper language versioning system #42

rjelliffe opened this issue Jul 11, 2022 · 23 comments
Labels
2025 A change made in preparing the 2025 edition proposed Proposed enhancement

Comments

@rjelliffe
Copy link
Member

rjelliffe commented Jul 11, 2022

Proposal:
The correct method for producing a new version of Schematron with new syntax, is to make a new namespace. This is how the move from Academia Sinica's original Schematon to ISO Schematron was handled. Implementations need a slight tweak to handle it, and then versioning stops being a problem for users.

In particular, I propose

How:*

Schematron implementations including existing ones should move to either:

a) preprocess schema files to convert the namespace to the one they understand, after which their normal error reporting mechanism will report if they know the particular element or attribute.

For example, an existing implementation would test for
:[starts-with(namespace(), "http://purl.oclc.org/dsdl/schematron-")]
and rename such an element to
{http://purl.oclc.org/dsdl/schematron}*

For example, a Schematron 2 implementation would test for
:[starts-with(namespace(), "http://purl.oclc.org/dsdl/schematron")]
and rename such an element to
{http://purl.oclc.org/dsdl/schematron-2}*

For example, a Schematron 3 implementation would test for
:[starts-with(namespace(), "http://purl.oclc.org/dsdl/schematron")]
and rename such an element to
{http://purl.oclc.org/dsdl/schematron-3}*

b) OR have more complex @ match|@ elect expression.

For example, an existing implementation that currently looks for sch:pattern etc. would look for
*:pattern[starts-with(namespace(), "http://purl.oclc.org/dsdl/schematron")]

N.b. the Schematron 3 implementation would also have a test that rejects earlier version schemas marked experimental:
/*:schema[namespace()= "http://purl.oclc.org/dsdl/schematron-2-experimental"]
so that such schemas have a limited life in the wild, like the dinosaurs in Jurassic Park.

What is effect of this?

  • The schema has the appropriate version clearly marked up, for humans and implementations.
  • An implementation of the newer version will run existing schemas transparently. The older schema namespace will be converted to the newer one.
  • In implementation of an older version will run new schemas transparently if they only use features supported in the old version. So we can take an existing ISO version 1 schema, convert it up to the new namespace and it will run everywhere, then covert it back to the ISO version 1 namespace and it will still run everywhere.
  • If an implementation is presented with a more recent element from a newer namespace than it supports, it will 1) know that it is a Schematron element not some foreign element to be passed over, 2) complain that it does not understand that element therefore cannot process
  • Cut and paste is possible from old namespace schemas to new, without worrying about adjusting the /sch:schema namespace.
  • The library mechanism I propose would be made robust, because versions of the library made with an old namespace would not need updating.
  • Evolving the standard, including adding minor versions, experimental versions and even custom dialects (!), becomes well managed and not a big deal for standards committees.
  • It provides a mechanism by which experimental, draft and pre-standardization features can be tried and added, but labelled appropriately.
  • Schemas that were tried with experimental features would not be converted

Why isn't sch:schema/@Version or something good enough?: Once again we can see that XSLT and XSD have lead the way in how to get least bang-per-buck: in this case with namespaces.

So in XSD and XSLT (and most language) even though elements have their standard namespace, that is not enough to match the appropriate processor. So you cannot merely take an element's name and namespace URI and know what it may contain. In other words, information that the namespace is supposed to represent is missing. So you have to have one "standard" system for identifying the general semantics of the element (the namespace) and then another ad hoc mechanism for actually then identifying which schema/features/operations/version is being used.

(N.b. sch:schema/@schemaVersion documents the schema version, not the version of Schematron being used.)

What is the technical reason people have not used namespaces for versions? The reason was because the large corporate (proprietary and open source) infrastructures were not written to cope with versions in namespaces. Originally many people tried, on the expectation that there would be a one-to-one mapping from namespace to schema, but then found that sofware had been hardcoded for a particular namespace. And the desire to do databinding, to automatically convert between objects and XML, added another barrier of complexity. The problem was that the XML Namespaces specification was _under_specified in the area of versions.

Why doesn't that reason apply to Schematron?

Schematron is in a really nice position of having only a few implementations, so it is possible to build this in ahead of when it is needed. E.g. if XslSch and Oxygen etc put it in soon, it is harmless, but will allow a graceful transition to future versions.

Helpful for determining how to implement other solutions:
If this is adopted, then it helps us understand how other changes should be implemented.: any breaking change in semantics must be explicitly and locally marked up.

For example, a new version can add sch:let/@as because that does not change the interpretation of any other element or attribute.

But if we want to alter sch:pattern to act as a ruleset rather than a switch, we must either add some attribute to the sch:pattern or (better) add a new element sch:rule. But we could not use some top-level attribute such as /sch:schema/@patternsAreActuallyRulesets,nor a command-line parameter, nor a different namespace because none of these survive simple cut and pasting, or allow simple inspection.

@rkottmann
Copy link

Would be great if Schematron moves in this direction of using namespaces.

@AndrewSales
Copy link
Collaborator

I'm not convinced, but I will consult the Working Group.

@davidcarlisle
Copy link

As a general rule I don't think namespaces work well for versioning. If you change the namespace you have changed the name of every construct in the language so it's essentially a new language not an incremental version. No existing elements remain and any existing xpath or xslt acting on schematron documents will just see the files as not schematron.

Note xslt, mathml, svg, xhtml etc all have had multiple versions all using the same namespace for essentially this reason.

@xatapult
Copy link
Collaborator

xatapult commented Feb 6, 2024

I concur with David

@rjelliffe
Copy link
Member Author

rjelliffe commented Feb 6, 2024 via email

@davidcarlisle
Copy link

In what way would you have to change "every construct"?

The name of an element is the namespace-localname pair, so by definition, if you change the namespace it is a new element name.
This means that the new version is not an incremental addition with existing elements plus some new ones and perhaps a few removals. Every construct has a new name, there are no elements in common.

I hate default namespaces,

default namespace or prefix, the same issue applies.

A query such as

xmlns:sch ="http://purl.oclc.org/dsdl/schematron "
...
if(sch:*) then 'it's schematron' else ''t's not schematron'

will report new files as not schematron, of if you edit it to change the namespace it will report old ones are not schematron.

You could of course test for either namespace just as you could test for a file being schematron or svg, the two vocabularies would be distinct languages with nothing in common.

@kosek
Copy link

kosek commented Feb 6, 2024

I think that namespace should be changed only if semantics of an existing element changes in the new version. Namespace change in such case will prevent misinterpretation of schema by an older implementation. But changing namespace only because of a new version that adds few new minor functions would only confuse users.

@murata2makoto
Copy link

Changing namespaces is useful for preventing old applications from providing incorrect results for data conforming to newer versions. RELAX NG and CREPDL did change their namespaces for this reason.

@AndrewSales
Copy link
Collaborator

AndrewSales commented Feb 7, 2024

I can see that this approach works where a significant break with previous versions needs to be signalled. It has been used in the move from pre-ISO Schematron to ISO Schematron or DocBook 4.5 to 5.0, but I do not see a strong use case this time for Schematron.

Changing the namespace in the next edition of the standard would invalidate all extant Schematron schemas. The task of updating the namespace may become non-trivial for users whose schemas are built from many (often re-used) components, for example.

I would not rule it out in future, but I think for now the XSLT-style approach (one namespace across versions) is preferable.

@rjelliffe
Copy link
Member Author

rjelliffe commented Feb 8, 2024 via email

@rjelliffe
Copy link
Member Author

rjelliffe commented Feb 8, 2024 via email

@AndrewSales AndrewSales added proposed Proposed enhancement deferred Deferred until a future revision labels Feb 10, 2024
@murata2makoto
Copy link

Suppose that we do not change the namespace. Then, what will existing Schematron implementions do for Schematron schemas containing new features in the upcoming version? Will they crash? Will they report incorrect results? Without knowing what will happen, I don't think that we can make a reasonable decision.

@dmj
Copy link
Member

dmj commented Feb 21, 2024

Suppose that we do not change the namespace. Then, what will existing Schematron implementions do for Schematron schemas containing new features in the upcoming version? Will they crash? Will they report incorrect results? Without knowing what will happen, I don't think that we can make a reasonable decision.

Per 7.2 "[a] full-conformance implementation shall be able to determine for any XML document whether it is a correct schema." As an implementer I read this as: Processor should terminate with an error if the document contains unknown element from the ISO Schematron namespace.

The requirements of a simple-conformance implementation in 7.1 has changed in the 3rd edition.

It now reads: "A simple-conformance implementation shall be able to report for any XML document that it does not conform to the constraints expressed by a given Schematron schema."

While in the previous version it read: "A simple-conformance implementation shall be able to report for any XML document that its structure does not conform to that of a valid Schematron schema."

I don't know why the definition of a simple-conformance implementation was changed in such a drastic way. If we go with the 2nd edition, then I would also read it as: Terminate with an error if the document contains unknown element from the ISO Schematron namespace.

@rjelliffe
Copy link
Member Author

rjelliffe commented Feb 21, 2024 via email

@AndrewSales
Copy link
Collaborator

AndrewSales commented Feb 24, 2024

I think the change to the definition of simple conformance in 2020 shifts the onus onto the user to provide a valid schema in the first place. It adds a bullet referring to Annex B, the Schematron schema for Schematron schemas, as if to emphasize this.

what will existing Schematron implementions do for Schematron schemas containing new features in the upcoming version?

I think this hinges on backward compatibility. The new features so far have I think only been additive, so a schema valid to the current standard should also be valid to the next edition. It smooths the upgrade path if schema maintainers want to make use of the new features to leave the namespace unchanged.

Strictly speaking, there is only one edition of an ISO standard, the latest one, which supersedes all previous editions. So an implementation claiming conformance is either up to date or not.

But we could at least say that conformant implementations should (i.e. advised, not required) report nodes in the Schematron namespace that they do not recognize. (By the way, I find the wording "shall be able to report[...]" a little unsatisfactory: mandating the ability to do something is at one remove from the actual doing.)

@murata2makoto
Copy link

A schema conforming to the next version may have syntactical constructs not explicitly allowed in the present version. Such constructs are typically unavoidable. I believe that everybody agrees.

Suppose that a document is not valid against a schema conforming to the next version. Existing validators conforming to the previous version are not always able to report that the given document is invalid against the schema. This is because existing implementations cannot handle new constructs appropriately. Again, I believe that everybody agrees.

However, we should prevent existing validators from incorrectly reporting validity without any warnings or errors. This is a must.

Meanwhile, there are some desiderata. They would be very nice.

D1: A schema conforming to the present version of ISO Schematron conforms to the next version.

D2: A document valid against a schema conforming to the present version is reported as valid even if validation is done by a validator conforming to the next version.

D3: A document valid against a schema conforming to the next version is reported as valid even if validation is done by a validator conforming to the previous version.

Changing the Schematron namespace destroys D1, D2, and D3, and is thus acceptable only when it is needed for preventing incorrect validity reports.

@rjelliffe
Copy link
Member Author

rjelliffe commented Feb 25, 2024 via email

@murata2makoto
Copy link

D3 is not impossible if new elements belong to different namespaces.

@AndrewSales
Copy link
Collaborator

I would be uneasy about a mixture of namespaces to represent Schematron structures, if that's what is being suggested - it would likely prove fiddly for users and poses the question what would happen in a putative edition after next, if that added more new things.

we should prevent existing validators from incorrectly reporting validity without any warnings or errors

Could we effectively invert this, by requiring implementations conformant to the next version to report as much, at user option?

Existing validators could not do so, so would not risk being accidentally conformant.

The less invasive versioning technique would be a new attribute on element schema, which could also be included in SVRL output or otherwise reported.

@murata2makoto
Copy link

I would be uneasy about a mixture of namespaces to represent Schematron structures, if that's what is being suggested

I mentioned that possibility just as a theoretical possibility. Here is yet another possibility. Had we allowed validators conforming to the previous version to skip unknown elements in the existing schematron namespace, we could have achieved D3.

I am reluctant to introduce a new namespace for the next version of Schematron, and prefer the version attribute.

@AndrewSales
Copy link
Collaborator

The gist of the current draft is:

  • introduce a new attribute, schematronVersion, on root element schema
  • it is optional and has a single allowed value, 2025
  • implementations must report this attribute and its value or its absence, at user option.

@AndrewSales AndrewSales added 2025 A change made in preparing the 2025 edition and removed deferred Deferred until a future revision labels Mar 11, 2024
AndrewSales added a commit to Schematron/schema that referenced this issue Mar 11, 2024
@rjelliffe
Copy link
Member Author

rjelliffe commented Mar 12, 2024 via email

@AndrewSales
Copy link
Collaborator

I've gone with schematronEdition in the working draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2025 A change made in preparing the 2025 edition proposed Proposed enhancement
Projects
None yet
Development

No branches or pull requests

8 participants