Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split ODD into separate files and then combine in the compilation process? #66

Open
ingoboerner opened this issue Jun 14, 2024 · 7 comments

Comments

@ingoboerner
Copy link
Collaborator

ingoboerner commented Jun 14, 2024

As @cmil, @lehkost and me discussed briefly at the CCLS conference the current ODD file feels cluttered and is quite hard to maintain. We maybe want to rework it and modularize it so that we can adapt it for certain corpora more easily:

  1. Maybe use "tei_all" in the first place as base (minimal requirement: validate against TEI all); include all that is there, i.e. exclude="" on moduleRef; maybe only from relevant modules, i.e.
<moduleRef key="core" except=""/>
<moduleRef key="tei" except=""/>
<moduleRef key="header" except=""/>
<moduleRef key="textstructure" except=""/>
<moduleRef key="drama" except=""/>
<moduleRef key="namesdates" except=""/>
<moduleRef key="corpus" except=""/>
<moduleRef key="linking" except=""/>
<moduleRef key="figures" except=""/>
<moduleRef key="analysis" except=""/>

compare to current version:

dracor-schema/dracor.odd

Lines 527 to 550 in 00fb7ea

<moduleRef key="core"
include="author bibl biblScope cit date desc editor emph foreign graphic head l label lb lg name note p pb publisher pubPlace ref resp respStmt sp speaker stage term title quote"/>
<moduleRef key="tei" except=""/>
<moduleRef key="header"
include="availability change classCode fileDesc idno keywords licence listChange profileDesc publicationStmt revisionDesc sourceDesc teiHeader textClass titleStmt"/>
<moduleRef key="textstructure"
include="TEI argument back body div dateline docAuthor docTitle epigraph front text titlePage titlePart signed trailer"/>
<moduleRef key="drama"
include="actor castGroup castItem castList performance role roleDesc set spGrp"/>
<moduleRef key="namesdates"
include="event forename genName listEvent listPerson listRelation nameLink person personGrp persName relation surname"/>
<moduleRef key="corpus" include="particDesc"/>
<moduleRef key="linking" include="ab standOff"/>
<moduleRef key="figures" include="figure"/>
<moduleRef key="analysis" include=""/>

This file might do nothing else.

  1. We then need to restrict the usage of some elements or change the content model, the values of some attributes that are relevant to the API. These element changes will affect certain elements/attributes that we need to restrict because the API to some degree expects to find certain things (@ xml:id on root <TEI>) and is confused when there are some unexpected things, e.g. multiple <text> elements as in some swedracor files..

  2. take (2) and add examples <exemplum> if we have them for certain elements, e.g.
    currently

    dracor-schema/dracor.odd

    Lines 771 to 869 in 00fb7ea

    <!-- author -->
    <elementSpec ident="author" module="core" mode="change">
    <attList>
    <attDef ident="xml:id" mode="delete"/>
    <attDef ident="xml:lang" mode="delete"/>
    <attDef ident="xml:base" mode="delete"/>
    <attDef ident="xml:space" mode="delete"/>
    <attDef ident="n" mode="delete"/>
    <attDef ident="rend" mode="delete"/>
    <attDef ident="style" mode="delete"/>
    <attDef ident="rendition" mode="delete"/>
    <attDef ident="cert" mode="delete"/>
    <attDef ident="resp" mode="delete"/>
    <attDef ident="source" mode="delete"/>
    <attDef ident="role" mode="delete"/>
    <attDef ident="nymRef" mode="delete"/>
    <attDef ident="key" mode="delete"/>
    <attDef ident="calendar" mode="delete"/>
    <attDef ident="period" mode="delete"/>
    <attDef ident="when" mode="delete"/>
    <attDef ident="notBefore" mode="delete"/>
    <attDef ident="notAfter" mode="delete"/>
    <attDef ident="from" mode="delete"/>
    <attDef ident="to" mode="delete"/>
    <attDef ident="when-iso" mode="delete"/>
    <attDef ident="notBefore-iso" mode="delete"/>
    <attDef ident="notAfter-iso" mode="delete"/>
    <attDef ident="from-iso" mode="delete"/>
    <attDef ident="to-iso" mode="delete"/>
    <attDef ident="when-custom" mode="delete"/>
    <attDef ident="notBefore-custom" mode="delete"/>
    <attDef ident="notAfter-custom" mode="delete"/>
    <attDef ident="from-custom" mode="delete"/>
    <attDef ident="to-custom" mode="delete"/>
    <attDef ident="datingPoint" mode="delete"/>
    <attDef ident="datingMethod" mode="delete"/>
    <attDef ident="corresp" mode="delete"/>
    <attDef ident="synch" mode="delete"/>
    <attDef ident="sameAs" mode="delete"/>
    <attDef ident="copyOf" mode="delete"/>
    <attDef ident="next" mode="delete"/>
    <attDef ident="prev" mode="delete"/>
    <attDef ident="exclude" mode="delete"/>
    <attDef ident="select" mode="delete"/>
    <attDef ident="ana" mode="delete"/>
    <attDef ident="ref" mode="delete"/>
    </attList>
    <exemplum source="#ger000546">
    <egXML xmlns="http://www.tei-c.org/ns/Examples">
    <author>
    <persName>
    <forename>Andreas</forename>
    <surname>Gryphius</surname>
    </persName>
    <idno type="wikidata">Q77214</idno>
    <idno type="pnd">118543032</idno>
    </author>
    </egXML>
    <ab> Encoding of the author "Andreas Gryphius" of the play <ref
    target="https://dracor.org/id/ger000546">Leo Armenius oder
    Fürsten-Mord</ref>. </ab>
    </exemplum>
    <exemplum source="#rus000205">
    <egXML xmlns="http://www.tei-c.org/ns/Examples">
    <author>
    <persName>
    <forename>Владимир</forename>
    <forename type="patronym">Иванович</forename>
    <surname>Бельский</surname>
    </persName>
    <persName xml:lang="eng">
    <forename>Vladimir</forename>
    <surname>Belsky</surname>
    </persName>
    <idno type="wikidata">Q1259652</idno>
    </author>
    </egXML>
    <ab>Encoding of the author "Владимир Иванович Бельский" of the play <ref
    target="https://dracor.org/id/rus000205">Сказание о невидимом
    граде Китеже и деве Февронии</ref>.</ab>
    </exemplum>
    <remarks>
    <ab>For additional information on the encoding of author names and the
    rationale also see the following GitHub issues:
    <list>
    <item>
    <ref type="githubissue"
    target="https://github.com/dracor-org/dracor-api/issues/119"
    >https://github.com/dracor-org/dracor-api/issues/119</ref>
    </item>
    <item>
    <ref type="githubissue"
    target="https://github.com/dracor-org/dracor-schema/issues/21"
    >https://github.com/dracor-org/dracor-schema/issues/21</ref>
    </item>
    </list>
    </ab>
    </remarks>
    </elementSpec>

    In the "examples odd" file we would do (maybe rework @source and include something with is based on a defined prefix in prefixDecl (or how the element is called):

<!-- author -->
                    <elementSpec ident="author" module="core" mode="change">
                        <exemplum source="#ger000546">
                            <egXML xmlns="http://www.tei-c.org/ns/Examples">
                                <author>
                                    <persName>
                                        <forename>Andreas</forename>
                                        <surname>Gryphius</surname>
                                    </persName>
                                    <idno type="wikidata">Q77214</idno>
                                    <idno type="pnd">118543032</idno>
                                </author>
                            </egXML>
                            <ab> Encoding of the author "Andreas Gryphius" of the play <ref
                                    target="https://dracor.org/id/ger000546">Leo Armenius oder
                                    Fürsten-Mord</ref>. </ab>
                        </exemplum>
                        <exemplum source="#rus000205">
                            <egXML xmlns="http://www.tei-c.org/ns/Examples">
                                <author>
                                    <persName>
                                        <forename>Владимир</forename>
                                        <forename type="patronym">Иванович</forename>
                                        <surname>Бельский</surname>
                                    </persName>
                                    <persName xml:lang="eng">
                                        <forename>Vladimir</forename>
                                        <surname>Belsky</surname>
                                    </persName>
                                    <idno type="wikidata">Q1259652</idno>
                                </author>
                            </egXML>
                            <ab>Encoding of the author "Владимир Иванович Бельский" of the play <ref
                                    target="https://dracor.org/id/rus000205">Сказание о невидимом
                                    граде Китеже и деве Февронии</ref>.</ab>
                        </exemplum>
                        <remarks>
                            <ab>For additional information on the encoding of author names and the
                                rationale also see the following GitHub issues:
                                <list>
                                    <item>
                                        <ref type="githubissue"
                                            target="https://github.com/dracor-org/dracor-api/issues/119"
                                            >https://github.com/dracor-org/dracor-api/issues/119</ref>
                                    </item>
                                    <item>
                                        <ref type="githubissue"
                                            target="https://github.com/dracor-org/dracor-schema/issues/21"
                                            >https://github.com/dracor-org/dracor-schema/issues/21</ref>
                                    </item>
                                </list>
                            </ab>
                        </remarks>
                    </elementSpec>

But the question remains how we put that together in the end?

@ingoboerner ingoboerner changed the title Split up ODD into separate files and then combine in the compilation process? Split ODD into separate files and then combine in the compilation process? Jun 14, 2024
@ingoboerner
Copy link
Collaborator Author

ingoboerner commented Jun 14, 2024

TEI stylesheet for merging TEI ODD specification with source to make a new source document.
https://tei-c.org/release/doc/tei-xsl/odds/odd2odd0.html#odd2odd.xsl

This little guide is intended to explain the mechanism of ODD chaining. An ODD file specifies
a particular view of the TEI, by selecting particular elements, attributes, etc. from the whole
of the TEI. But you can also refine such a specification further, making your ODD derive from
another one. In principle you can chain together ODDs in this way as much as you like. You
can use this feature in several different ways:
• you can add additional restrictions to an existing ODD, for example by changing the value
list of an attribute
• you can further reduce the subset of elements provided by an existing ODD
• you can add new elements or modules to an existing ODD

One [@source] with the value
‘mySuperODD.subset.xml’ will go looking for declarations in a file of that name in the current
source tree. And one with the value ‘http://example.com/superODDs/anotherSubset.xml’ will
go looking for it at the URL indicated.

https://teic.github.io/PDF/howtoChain.pdf

@cmil
Copy link
Member

cmil commented Jun 14, 2024

How about using the TEI Drama ODD provided by the TEI consortium (also available with TEI Roma) as the source for the DraCor ODD. We would have to add some elements like particDesc, standOff and listEvent which seem to be omitted there, and adjust some content models. But then we would perhaps already have a reasonable starting point.

@ingoboerner
Copy link
Collaborator Author

ingoboerner commented Jun 17, 2024

So we would need to use the https://tei-c.org/release/xml/tei/custom/odd/tei_drama.odd in the @source of <schemaSpec> and hope for the best? The old <schemaSpec> already included a good subset of elements I think. Will test it with the drama ODD though.

Legacy ODD included 82 elements; if I would include all modules that were in in the legacy odd we end up with 315 elements

The TEI Drama ODD includes the following modules:

<schemaSpec ident="tei_drama" start="TEI teiCorpus">
        <moduleRef key="header"/>
        <moduleRef key="core"/>
        <moduleRef key="tei"/>
        <moduleRef key="textstructure"/>
        <moduleRef key="linking"/>
        <moduleRef key="drama"/>
<!-- ... -->

The schema contains 226 elements.

@cmil
Copy link
Member

cmil commented Jun 19, 2024

So we would need to use the https://tei-c.org/release/xml/tei/custom/odd/tei_drama.odd in the @source of <schemaSpec> and hope for the best? The old <schemaSpec> already included a good subset of elements I think. Will test it with the drama ODD though.

We could use Roma to start from the TEI Drama ODD, add the missing elements there and then use the resulting ODD for further refinement to our purposes.

@ingoboerner
Copy link
Collaborator Author

ingoboerner commented Jun 19, 2024

I already copied it together in my local draft of the ODD. It seems to work without @source, but explicitly re-using this Drama ODD

<div xml:id="div_schema">
                <head>Schema</head>
                <schemaSpec ident="dracor-api" docLang="en" prefix="tei_" xml:lang="en" start="TEI">

                    <!-- modules included in the tei_drama ODD:
                header, core, tei, textstructure, linking, drama
                -->
                    <moduleRef key="header"/>
                    <moduleRef key="core"/>
                    <moduleRef key="tei"/>
                    <moduleRef key="textstructure" except="div1 div2 div3 div4 div5 div6 div7"/>
                    <moduleRef key="linking"/>
                    <moduleRef key="drama"/>

                    <!-- The dracor-legacy ODD also included additional elements from the following modules: -->

                    <moduleRef key="namesdates"
                        include="event forename genName listEvent listPerson listRelation nameLink person personGrp persName relation surname"/>
                    <moduleRef key="corpus" include="particDesc"/>
                    <moduleRef key="figures" include="figure"/>
<!-- ... -->
</schemaSpec>

Results in 233 Elements. Maybe we can later go through the element list and kick some of them out again. Next step would be to look into the requirements of the API , e.g. specific encoding of the digital and original sources in the <bibl> elements in <sourceDesc>. I would do that with Schematron, e.g.

<!-- sourceDesc -->
                    <elementSpec ident="sourceDesc" module="header" mode="change">
                        <constraintSpec ident="digital_source_in_sourceDesc" scheme="schematron"
                            mode="add">
                            <desc>Checks if a digital source is present in the
                                <gi>sourceDesc</gi></desc>
                            <constraint>
                                <sch:rule context="tei:sourceDesc">
                                    <sch:assert test="tei:bibl[@type eq 'digitalSource']">Digital
                                        source is missing </sch:assert>
                                </sch:rule>
                            </constraint>
                        </constraintSpec>
                        <constraintSpec ident="original_source_in_sourceDesc" scheme="schematron"
                            mode="add">
                            <desc>Checks if a original source for a digital source is
                                available</desc>
                            <constraint>
                                <sch:rule
                                    context="tei:sourceDesc/tei:bibl[@type eq 'digitalSource']">
                                    <sch:assert test="tei:bibl[@type eq 'originalSource']">Original
                                        Source for digital source is missing </sch:assert>
                                </sch:rule>
                            </constraint>
                        </constraintSpec>
                    </elementSpec>

@ingoboerner
Copy link
Collaborator Author

OK, I would propose the following:

  1. Proceed with the base schema/odd as agreed above.
  2. Define the "feature" (see D7.1 Report On Programmable Corpora) inside the ODD, e.g.
<div xml:id="play_id">
                        <head>Play ID</head>
                        <p>Feature <idno type="feature-no">P2</idno> <idno type="feature-id">play_id</idno>: <name>DraCor ID</name> of the play, e.g. <val>ger000171</val>.</p>
                        <p>In the TEI source file the <name>DraCor ID</name> is contained in the attribute <att>xml:id</att> on the root element <gi>TEI</gi>.</p>
                        <p>The identifier SHOULD match the Regular Expression <val>^[a-z]+[0-9]{6}$</val>.</p>
                    </div>
  1. Add Schematron rules to check if the API will manage to return data for a feature, e.g.
<constraintSpec ident="valid_dracor_ids_on_root_tei_element"
                            scheme="schematron" mode="add" corresp="#play_id">
                            <desc>DraCor identifiers should consist of lower case letters followed by a six-digit number. The value is returned as feature
                            <ref target="#play_id">play_id</ref> in the API response object.</desc>
                            <constraint>
                                <sch:rule context="tei:TEI" role="warning">
                                    <sch:assert test="matches(./@ xml:id,'^[a-z]+[0-9]{6}$')"> For
                                        DraCor IDs we recommend the pattern ^[a-z]+[0-9]{6}$
                                    </sch:assert>
                                </sch:rule>
                            </constraint>

The result in the rendered HTML ODD:
Bildschirmfoto 2024-06-19 um 13 39 43

The Schematron Rule links to the feature ref/ @corresp:
Bildschirmfoto 2024-06-19 um 13 40 04

The generated RelaxNG contains the Schematron rules and can be used in Oxygen to validate a file. In the example it now produces a warning:
Bildschirmfoto 2024-06-19 um 13 41 08

@ingoboerner
Copy link
Collaborator Author

There is another/additional option to check if a TEI file supports certain API features.
In <schemaSpec> we can include <constraintSpec> elements with schematron rules that explicitly report (!) /(not assert) if a certain condition in the encoding is met.
An example: If the file contains <title type="main">Whatever Main Title</title> the API will be able to return the title info in the response objects. We can now include a schematron rule/constraintSpec that checks exactly for that and report that a feature is supported
Bildschirmfoto 2024-06-26 um 15 02 56

Bildschirmfoto 2024-06-26 um 15 03 58

if it is not supported, I provide a "Warning" which might help encoders to add the elements that are needed for a feature to be supported:

Bildschirmfoto 2024-06-26 um 15 04 50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants